Skip to content

fix(diffing): strip session-local sdBlockId from diff fingerprint (SD-3279)#3527

Merged
caio-pizzol merged 5 commits into
mainfrom
caio-pizzol/sd-3279-strip-sdblockid-from-diff-fingerprint
May 27, 2026
Merged

fix(diffing): strip session-local sdBlockId from diff fingerprint (SD-3279)#3527
caio-pizzol merged 5 commits into
mainfrom
caio-pizzol/sd-3279-strip-sdblockid-from-diff-fingerprint

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol commented May 27, 2026

Two SuperDoc editor instances loaded from the same DOCX assign different sdBlockId UUIDs at editor startup, so the diff fingerprint differs across instances and editor.doc.diff.apply from one editor to another throws PRECONDITION_FAILED. This blocks the cross-editor preview pattern that customer revision-history UIs want to build (IT-1116).

The attribute-diff path already treats sdBlockId as non-semantic (IGNORED_ATTRIBUTE_KEYS), and the table-cell NO_OP comparator already lists sdBlockId / sdBlockRev / paraId / textId / rsid* as identity attrs. The bug was internal drift: the fingerprint normalization only stripped paragraph-level paraId / textId / rsid* and never touched sdBlockId / sdBlockRev, on paragraphs or anywhere else.

  • Hoists the identity-attr list into one shared constant (NON_SEMANTIC_BLOCK_ATTRS in extensions/diffing/algorithm/identity-attrs.ts), consumed by both the diff fingerprint normalization and the table-cell NO_OP comparator. One source of truth so the lists cannot drift again.
  • Strips identity attrs from every block node (paragraphs, tables, rows, cells, sections), not just paragraphs. sdBlockId lives on all of those.
  • Adds a regression marker test for the export → reimport flow, which still fails after this fix. Tracked separately as SD-3282; the marker test currently asserts the known failure mode so the gap is documented in-tree.

Backward compatibility for persisted artifacts

The normalization change would invalidate any DiffSnapshot or DiffPayload already persisted under the pre-SD-3279 algorithm (the customer's revision-history workflow stores DiffSnapshot blobs). To avoid breaking existing artifacts:

  • compareToSnapshot and applyDiffPayload re-derive with the current normalizer first, then fall back to a legacy normalizer that reproduces the pre-fix algorithm exactly. Either match accepts the artifact. Genuinely tampered snapshots fail both checks.
  • Scope of recovery: legacy DiffSnapshots are fully recovered. Legacy DiffPayloads are recovered only for same-editor reapply where sdBlockId values are still intact; cross-session reapply of old payloads was already broken pre-fix and the fallback can't retroactively rescue it.
  • No snapshot/payload version bump — the legacy normalizer lives behind the validation path only; the forward path is unchanged.

Verified: 257/257 diffing tests pass. pnpm check:types clean. Cross-editor handoff test fails pre-fix with fingerprint mismatch, passes post-fix. Same-editor regression guard still passes. Legacy snapshot accepted by compareToSnapshot; legacy same-editor diff payload accepted by applyDiffPayload; genuinely tampered snapshot still rejected. The 11 image-move conformance failures in contract-conformance.test.ts pre-existed on main; confirmed unrelated to this change. Added doc-api story covering cross-session diff handoff through doc.diff.capture / compare / apply.

…-3279)

`editor.doc.diff.apply` checks that the editor it runs against has the
same canonical fingerprint as the editor that produced the diff. Two
SuperDoc editor instances loaded from the same DOCX assign different
`sdBlockId` UUIDs at editor startup, so their canonical states diverge
and the cross-editor handoff fails with PRECONDITION_FAILED. The
attribute-diff path already treats `sdBlockId` as non-semantic
(IGNORED_ATTRIBUTE_KEYS in attributes-diffing.ts), and the table-cell
NO_OP comparator already lists `sdBlockId` + `sdBlockRev` + paraId +
textId + rsid* as identity attrs. The bug was internal drift: the diff
fingerprint normalization only stripped paraId / textId / rsid* from
paragraph attrs and ignored sdBlockId / sdBlockRev entirely.

Hoist the identity-attr list into a shared constant
(`NON_SEMANTIC_BLOCK_ATTRS` in
`extensions/diffing/algorithm/identity-attrs.ts`) and use it from both
consumers. Apply the strip to every block node, not just paragraphs —
`sdBlockId` lives on tables, rows, cells, and sections too. Tracked
sdBlockRev as adjacent hardening (same identity-attr category, already
in the table comparator's list); the diagnostic only observed
sdBlockId divergence in the two-editor snapshot payloads.

Tests:
- Canonicalization: `normalizeDocJSON` now strips identity attrs from
  non-paragraph block nodes; two doc trees that differ only in
  sdBlockId values normalize to the same shape.
- Fingerprint: `buildCanonicalDiffableState` produces the same
  fingerprint for body trees that differ only in identity attrs.
- Cross-editor handoff: a diff produced by one editor instance applies
  cleanly to a second editor instance with the same content (the
  customer's preview-pane pattern from IT-1116).
- Same-editor regression guard: capture / mutate / compare / apply on
  one editor still works.
- Export → reimport regression marker: covers the second canonical
  divergence layer beyond sdBlockId. Currently asserts the known
  failure mode; flip to a success assertion when SD-3282 is fixed.

Verified:
- 252/252 diffing tests pass.
- `pnpm check:types` clean.
- 11 image-move conformance failures pre-existed on main; confirmed
  unrelated to this change by stashing and re-running.
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 27, 2026

SD-3279

SD-3282

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Locks in the public Document API surface for the customer's
preview-pane pattern (IT-1116). Three sessions, base and preview
sharing identical content; target produces the snapshot; main session
computes the diff; the diff is applied to the *separately-opened*
preview session as tracked changes.

Pre-SD-3279 this throws PRECONDITION_FAILED because the two sessions'
canonical fingerprints diverge on session-local sdBlockId values. The
service-level integration test in diff-service.test.ts already covers
this; the story sits at the SDK/CLI dispatch tier that customers
actually use, so a regression in the public path would be caught even
if the internal helper kept passing.

Verified end-to-end via vitest run against the rebuilt CLI + SDK on
the fix branch.
The earlier story commit verified apply result + tracked changes total.
Adds the customer-visible side: the preview session's body text now
contains the target paragraph after the diff is applied. Same single
test, one extra assertion. Catches a class of regression where the
apply succeeds in tracking-changes accounting but never lands the
underlying content.
@caio-pizzol caio-pizzol marked this pull request as ready for review May 27, 2026 13:58
@caio-pizzol caio-pizzol requested a review from a team as a code owner May 27, 2026 13:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d643338c2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 7 files

Linked issue analysis

Linked issue: SD-3279: diff.apply can fingerprint-mismatch across editor instances loaded from the same DOCX

Status Acceptance criteria Notes
Strip session-local sdBlockId (and sdBlockRev) from the diff fingerprint normalization so fingerprint does not vary across editor sessions. A shared NON_SEMANTIC_BLOCK_ATTRS constant now includes sdBlockId and sdBlockRev and is used by the normalization code; unit tests assert normalized output/fingerprint is identical when only sdBlockId/sdBlockRev differ.
Strip identity-only attributes from every block node (paragraphs, tables, rows, cells, sections) rather than only paragraphs so fingerprints are stable for all block types. normalizeDocNodeJSON now applies block-attr normalization to container nodes and tests verify identity attrs are stripped from table/row/cell/section nodes.
Unify the identity-attr list into one shared constant consumed by both the diff fingerprint normalization and the cell-content NO_OP comparator to prevent drift. New NON_SEMANTIC_BLOCK_ATTRS constant is added and consumed from identity-attrs.ts; table adapter now imports that set instead of duplicating the list.
Cross-editor handoff: a diff produced by one editor instance can be applied to a second editor instance (same DOCX content) in tracked-change mode and produces tracked-change operations. The PR adds tests that exercise capture/compare/apply across separate editor instances and the PR description and tests indicate the cross-editor handoff test passes post-fix.
Preserve same-editor capture/compare/apply behavior (regression guard). Tests explicitly include a same-editor regression guard and the PR notes same-editor path still passes; test asserts tracked changes are applied and content matches target.

Re-trigger cubic

The previous SD-3279 commit changed the canonical normalization (stripped
sdBlockId / sdBlockRev) without bumping the snapshot/payload version.
Any DiffSnapshot persisted by a customer under the pre-fix algorithm
would now be rejected as tampered when re-derivation runs through the
new normalizer. Same risk for DiffPayloads in same-session reapply.

Add a parallel legacy normalization path that reproduces the pre-fix
behavior exactly (paragraph-only attribute stripping over the original
7-element set, no stripping on structural containers, image
originalAttributes stripping unchanged). compareToSnapshot and
applyDiffPayload now accept either the new or the legacy fingerprint:
re-derive with the current normalizer first; on mismatch, retry with
the legacy normalizer. Genuinely tampered artifacts fail both checks
and still throw.

Scope of recovery:
- DiffSnapshot persisted under old algorithm: fully recovered. This is
  the customer's revision-history workflow (capture and store).
- DiffPayload persisted under old algorithm: recovered only for
  same-editor reapply where the editor still holds the original
  sdBlockIds. Cross-session reapply of old payloads was already broken
  by per-session sdBlockId divergence — that's the exact bug SD-3279
  fixes for new snapshots; the legacy fallback can't retroactively
  rescue it.

Tests added:
- diff-service.test.ts: legacy snapshot accepted by compareToSnapshot;
  legacy same-editor payload accepted by applyDiffPayload; genuinely
  tampered snapshot still rejected.
- fingerprint.test.ts: legacy normalizer produces a different
  fingerprint than the current normalizer for docs with sdBlockId
  (otherwise the fallback would be dead code).
- diff-service.test.ts: SD-3282 ticket reference added to the existing
  export → reimport regression marker comment.

Verified: 257/257 diffing tests pass; pnpm check:types clean.
Replay previously called setNodeMarkup with the diff's full source attrs,
overwriting the recipient editor's session-local sdBlockId / sdBlockRev with
the originator's values. Merge the AttributesDiff onto the recipient node's
existing attrs so any path the diff does not touch is preserved verbatim.
Also pass NON_SEMANTIC_BLOCK_ATTRS as ignoreKeys for non-paragraph block
attr diffs; paragraphs were already covered via normalizeParagraphAttrs.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 16 files

Linked issue analysis

Linked issue: SD-3279: diff.apply can fingerprint-mismatch across editor instances loaded from the same DOCX

Status Acceptance criteria Notes
Remove session-local sdBlockId/sdBlockRev (and related identity attrs) from the diff/fingerprint normalization for all block nodes (paragraphs, tables, rows, cells, sections). Normalization was extended to strip the non-semantic block attrs from every block node and tests validate normalized outputs are stable across differing sdBlockId values.
Hoist the identity-attr list into a single shared constant and consume it from both the fingerprint normalizer and NO_OP comparators to prevent drift. A single NON_SEMANTIC_BLOCK_ATTRS constant was added and imported where needed; previous duplicated lists were removed/replaced.
Ensure replay/apply does not overwrite the recipient editor's session-local identity attrs when applying attribute diffs (preserve recipient sdBlockId/sdBlockRev). An applyAttrsDiff helper was implemented and replay paths now merge diffs onto the recipient node attrs instead of replacing them; unit tests assert recipient sdBlockId is preserved.
Fix the customer-visible bug: a diff produced by one editor instance can be applied to a separately-opened editor instance loaded from the same DOCX (cross-editor capture → compare → apply works). End-to-end tests exercising cross-editor handoff were added and assert that a diff captured from one editor can be applied to another editor with the same content; this is the primary reported failure and the test passes per PR verification.
Preserve backward-compatibility: accept snapshots and same-editor diff payloads captured under the pre-fix algorithm via a legacy normalizer fallback (without changing forward path behavior). Legacy normalization functions and a legacy canonical-state builder were added; compareToSnapshot and applyDiffPayload retry fingerprint validation with the legacy canonicalizer before rejecting; tests cover accepting legacy snapshots and same-editor legacy payloads and rejecting tampered snapshots.
Add a regression marker for the known export → reimport fingerprint divergence (document as SD-3282) without changing scope of this PR. A regression marker test was intentionally added expecting the export→reimport path to still throw a fingerprint mismatch; the test documents the known limitation and is present in the test suite.

Re-trigger cubic

@caio-pizzol caio-pizzol self-assigned this May 27, 2026
@caio-pizzol caio-pizzol merged commit 760ab39 into main May 27, 2026
68 of 69 checks passed
@caio-pizzol caio-pizzol deleted the caio-pizzol/sd-3279-strip-sdblockid-from-diff-fingerprint branch May 27, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants