feat(document-api): implement doc.extract() for RAG content extraction (SD-2525) by caio-pizzol · Pull Request #2774 · superdoc-dev/superdoc

caio-pizzol · 2026-04-10T17:25:38Z

Single API method that extracts all document content with stable IDs for RAG pipelines.

editor.doc.extract() returns blocks with full text, comments with anchored block references, and tracked changes with excerpts
Every ID works directly with scrollToElement() for citation navigation
Follows the Document API contract pattern (4 touch points: operation-definitions, registry, schemas, dispatch)
No arbitrary limits — returns all blocks in document order
Full text per block (not the 80-char textPreview from blocks.list)

Usage:

const { blocks, comments, trackedChanges } = editor.doc.extract();

// Store IDs alongside embeddings
const chunks = blocks.map(b => ({ id: b.nodeId, text: b.text }));

// Navigate back on citation click
await superdoc.scrollToElement(chunk.id);

Closes SD-2525

linear · 2026-04-10T17:25:45Z

SD-2525 Document content extraction API for RAG pipelines

mintlify · 2026-04-10T17:25:49Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
SuperDoc	🟢 Ready	View Preview	Apr 10, 2026, 5:26 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e4827d3ab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/behavior/tests/navigation/extract.spec.ts

packages/super-editor/src/editors/v1/document-api-adapters/extract-adapter.ts

…n (SD-2525) Single API method that returns all document content with stable IDs — blocks with full text, comments with anchored block references, and tracked changes with excerpts. Every ID works directly with scrollToElement() for citation navigation.

- Use canonical getHeadingLevel() instead of divergent local regex - Reuse collectTopLevelBlocks() instead of duplicating block traversal - Add required fields to extract output JSON schema - Remove fixture-only unit tests that don't call executeExtract - Add behavior tests: headings, comments, tracked changes, scrollToElement round-trip

…-for-rag-pipelines

superdoc-bot · 2026-04-10T21:11:29Z

🎉 This PR is included in @superdoc-dev/react v1.0.0-next.38

The release is available on GitHub release

superdoc-bot · 2026-04-10T21:11:35Z

🎉 This PR is included in esign v2.2.0-next.42

The release is available on GitHub release

superdoc-bot · 2026-04-10T21:11:37Z

🎉 This PR is included in vscode-ext v1.1.0-next.84

superdoc-bot · 2026-04-10T21:11:41Z

🎉 This PR is included in template-builder v1.3.0-next.44

The release is available on GitHub release

superdoc-bot · 2026-04-10T21:12:55Z

🎉 This PR is included in superdoc v1.24.0-next.81

The release is available on GitHub release

superdoc-bot · 2026-04-10T21:13:06Z

🎉 This PR is included in superdoc-cli v0.5.0-next.82

The release is available on GitHub release

superdoc-bot · 2026-04-10T21:15:05Z

🎉 This PR is included in superdoc-sdk v1.3.0-next.83

…n (SD-2525) (#2774) * feat(document-api): implement doc.extract() for RAG content extraction (SD-2525) Single API method that returns all document content with stable IDs — blocks with full text, comments with anchored block references, and tracked changes with excerpts. Every ID works directly with scrollToElement() for citation navigation. * fix(document-api): review fixes — heading regex, schema required, tests - Use canonical getHeadingLevel() instead of divergent local regex - Reuse collectTopLevelBlocks() instead of duplicating block traversal - Add required fields to extract output JSON schema - Remove fixture-only unit tests that don't call executeExtract - Add behavior tests: headings, comments, tracked changes, scrollToElement round-trip * fix(tests): remove superdoc.click() — fixture uses type() for focus * fix(cli): add extract operation hints for CLI/SDK wiring

superdoc-bot · 2026-04-10T23:00:54Z

🎉 This PR is included in vscode-ext v2.3.0-next.1

superdoc-bot · 2026-04-10T23:01:19Z

🎉 This PR is included in template-builder v1.5.0-next.1

The release is available on GitHub release

superdoc-bot · 2026-04-10T23:01:22Z

🎉 This PR is included in esign v2.3.0-next.1

The release is available on GitHub release

superdoc-bot · 2026-04-10T23:02:09Z

🎉 This PR is included in superdoc v1.26.0-next.1

The release is available on GitHub release

superdoc-bot · 2026-04-10T23:02:11Z

🎉 This PR is included in superdoc-cli v0.7.0-next.1

The release is available on GitHub release

superdoc-bot bot added the review: quick label Apr 10, 2026

mintlify bot deployed to staging - apps/docs April 10, 2026 17:26 View deployment

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

tests/behavior/tests/navigation/extract.spec.ts Show resolved Hide resolved

packages/super-editor/src/editors/v1/document-api-adapters/extract-adapter.ts Outdated Show resolved Hide resolved

caio-pizzol force-pushed the caio/sd-2525-document-content-extraction-api-for-rag-pipelines branch from 7e4827d to 5cb7735 Compare April 10, 2026 17:44

mintlify bot deployed to staging - apps/docs April 10, 2026 17:45 View deployment

caio-pizzol added 3 commits April 10, 2026 11:45

fix(tests): remove superdoc.click() — fixture uses type() for focus

565e4a3

caio-pizzol force-pushed the caio/sd-2525-document-content-extraction-api-for-rag-pipelines branch from beb1257 to 565e4a3 Compare April 10, 2026 18:46

mintlify bot deployed to staging - apps/docs April 10, 2026 18:47 View deployment

fix(cli): add extract operation hints for CLI/SDK wiring

10b1403

caio-pizzol force-pushed the caio/sd-2525-document-content-extraction-api-for-rag-pipelines branch from 332999b to 10b1403 Compare April 10, 2026 20:39

mintlify bot deployed to staging - apps/docs April 10, 2026 20:40 View deployment

Merge branch 'main' into caio/sd-2525-document-content-extraction-api…

c6db53b

…-for-rag-pipelines

caio-pizzol added this pull request to the merge queue Apr 10, 2026

Merged via the queue into main with commit c2f2577 Apr 10, 2026
53 of 56 checks passed

caio-pizzol deleted the caio/sd-2525-document-content-extraction-api-for-rag-pipelines branch April 10, 2026 21:09

superdoc-bot bot added the released on @next label Apr 10, 2026

caio-pizzol mentioned this pull request Apr 10, 2026

feat(rag): replace fragmented extraction and text-search navigation (SD-2522) superdoc-dev/demos#1

Merged

caio-pizzol mentioned this pull request Apr 10, 2026

feat: bookmark api for cross session document addressing (SD-2358) #2663

Closed

Conversation

caio-pizzol commented Apr 10, 2026

Uh oh!

linear bot commented Apr 10, 2026

Uh oh!

mintlify bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

superdoc-bot bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mintlify bot commented Apr 10, 2026 •

edited

Loading