Skip to content

fix(super-editor): persist data-URI images set as SDT preset content (SD-3116)#3516

Open
luccas-harbour wants to merge 50 commits into
luccas/sd-3258-bug-image-resizing-inside-structured-contentfrom
luccas/sd-3116-bug-image-set-as-presetcontent-in-sdt-field-does-not-persist
Open

fix(super-editor): persist data-URI images set as SDT preset content (SD-3116)#3516
luccas-harbour wants to merge 50 commits into
luccas/sd-3258-bug-image-resizing-inside-structured-contentfrom
luccas/sd-3116-bug-image-set-as-presetcontent-in-sdt-field-does-not-persist

Conversation

@luccas-harbour
Copy link
Copy Markdown
Contributor

Summary

Images set as presetContent inside a structured-content (SDT) field – most commonly SVG signatures and other preset graphics – were being dropped on DOCX export. The root cause was a chain of assumptions about data-URI images that only held for inline-pasted, base64 PNG/JPEG content:

  • the renderer's image data-URL allowlist was hard-coded to base64, payloads, so URL-encoded SVGs (the format produced by most signature widgets) were rejected at paint time;
  • the registration plugin always routed data-URI sources through the canvas resize pipeline, which strips vector content and produced no usable file for SVGs;
  • the DOCX exporter looked up the image's package path with src.split('word/')[1], which returns undefined for a data: URI, so the relationship target was never written and the <a:blip> lost its image reference;
  • non-base64 payloads and field-annotation images had no shared validation, so the editor surface, the painter, and the exporter all disagreed about which data URIs were safe.

This PR consolidates the data-URI policy into shared/url-validation, threads it through the painter, importer, registration plugin, and DOCX exporter, and adds a roundtrip test covering preset-content insertion β†’ paint β†’ export β†’ re-import.

Highlights

Shared data-URI policy (shared/url-validation)

  • New exports: getDataUriMetadata, tryDecodeDataUriText, isValidImageDataUrl, plus IMAGE_DATA_URL_MIME_TYPES and MAX_IMAGE_DATA_URL_LENGTH.
  • Base64 payloads remain allowed for every supported image MIME type. Non-base64 payloads are accepted only for image/svg+xml, and only when the percent-encoded text decodes successfully. 10 MB cap is enforced uniformly.

Painter (layout-engine/painters/dom)

  • Inline image and field-annotation image rendering both delegate to isValidImageDataUrl. Removed the local VALID_IMAGE_DATA_URL regex / MAX_DATA_URL_LENGTH constant.
  • New tests cover non-base64 SVG rendering on both code paths.

Image registration plugin (super-editor/extensions/image)

  • SVG data URIs with finite intrinsic sizes are now registered in place, skipping the canvas resize step entirely.
  • getDataUriDecodedByteLength enforces the upload byte cap before in-place registration (handles both base64 and percent-encoded payloads).
  • Media registration mirrors entries to the parent editor's media store so child editors don't lose images at the parent boundary.
  • Async path: SVGs under the size cap upload directly without canvas reprocessing.

DOCX exporter (v3/handlers/wp/helpers/decode-image-node-helpers.js)

  • New createMediaTargetForDataUri allocates a stable word/media/image-<hash>.<ext> package path for each data-URI source, caches the mapping per export (params.dataUriMediaTargets), and resolves rId collisions by appending a random suffix when two distinct sources hash to the same path.
  • Relationship lookup is unified across header/footer and document parts (resolveImageRelationshipId + getImageRelationshipLookup).
  • Field-annotation export now validates the data URI before writing a relationship, falling back to text rendering when invalid.
  • Warns (instead of silently dropping) when a media target can't be resolved.

pm-adapter

  • resolveNodeSdtMetadata is now generic in its override type; callers like fieldAnnotationNodeToRun get the precise metadata type without a cast.
  • Empty-inline-SDT placeholder emission is now scoped to structuredContent metadata only β€” other inline SDT variants no longer collapse into a placeholder when empty.

Importer/helpers cleanup

  • helpers.js#dataUriToArrayBuffer accepts URL-encoded SVG payloads.
  • image-dimensions.js reads SVG intrinsic dimensions from <svg width/height> attributes (base64 and percent-encoded).
  • Shared simpleStringHash / stableHexHash helpers moved to core/utilities/hash.js; documentCommentsImporter and handleBase64 now reuse them.
  • mediaHelpers.js centralizes MIME β†’ extension mapping (getImageExtensionFromMimeType) and re-exports shared metadata helpers with the extension annotation.

Tests

  • sd-3116-structured-content-image-roundtrip.test.js β€” 453 LOC suite covering preset-content insertion, paint, save, and re-import for both base64 and percent-encoded SVG signatures plus PNG cases.
  • structured-content-commands.test.js β€” verifies insertStructuredContentBlock registers the preset image in media and rewrites src to a word/media/... path.
  • imageRegistrationPlugin.browser.test.js β€” in-place SVG registration, parent-store mirroring, and upload cap.
  • handleBase64.test.js, image-dimensions.test.js, mediaHelpers.test.js, helpers.test.js, decode-image-node-helpers.test.js β€” coverage for non-base64 SVG paths, malformed payloads, MIME normalization, and exporter target collisions.
  • painters/dom/src/index.test.ts β€” non-base64 SVG rendering for both image runs and field annotations.

Risk / compatibility

  • Renderer policy changes are additive (accepts more valid URIs, rejects fewer) for base64 PNG/JPEG/etc. β€” existing assets render the same.
  • The exporter previously emitted broken relationships for data-URI images; output now contains real media parts. Any consumer that relied on the broken behaviour (unlikely) would notice extra word/media/image-*.svg parts.
  • Per-conversation parent-editor media mirroring is opt-in via editor.options.parentEditor β€” unchanged for standalone editors.

Register data URI image sources as DOCX media parts during export so
the relationship target resolves correctly. Map the svg+xml MIME
subtype to a .svg extension when deriving the media filename.
…processing

Detect SVG data URIs with known finite sizes and register them in place
during browser-path image handling, skipping the canvas-based resize
pipeline that strips vector content. Normalize svg+xml extensions to
.svg when generating media filenames so the relationship target matches
the stored media key.
…tadata

Only emit the empty-inline-SDT placeholder when the resolved metadata
describes a structuredContent node, so other inline SDT variants aren't
collapsed into a placeholder text run when their content is empty.
Parse data URIs by inspecting the meta header rather than assuming
base64 encoding. URL-encoded payloads (e.g. SVG with charset=utf-8)
are now decoded as text and written through as-is, while base64
payloads continue through atob/binary conversion. Adds coverage for
the non-base64 SVG path in handleBase64 and the browser
registration plugin.
Replace the base64-only data URL regex with an allowlist-based
validator that accepts URL-encoded SVG payloads while still
restricting raster image MIME types to base64. Applies to both
inline image runs and field annotation images, and adds tests for
the SVG, raster, and non-image cases.
@luccas-harbour luccas-harbour requested a review from a team as a code owner May 26, 2026 20:42
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 26, 2026

SD-3116

@github-actions
Copy link
Copy Markdown
Contributor

I'll skip the spec lookup since permission wasn't granted β€” but I can review based on the actual XML structure being emitted.

Looking at the three changed files:

Status: PASS

The changes are primarily data-handling refactors, not OOXML structural changes:

  • documentCommentsImporter.js β€” pure refactor moving simpleHash β†’ shared stableHexHash utility. No OOXML element/attribute changes.
  • decode-image-node-helpers.js β€” refactors how data-URI images are turned into package media files and relationships. The emitted XML structure (wp:extent, wp:effectExtent, wp:docPr, pic:nvPicPr, pic:cNvPicPr/a:picLocks, pic:blipFill/a:blip, pic:spPr/a:xfrm/a:ext/a:off, a:prstGeom, a:noFill, a:graphic/a:graphicData) is unchanged.
  • The image relationship Type URI is correct: http://schemas.openxmlformats.org/officeDocument/2006/relationships/image (ECMA-376 Β§15.2.13).
  • MIMEβ†’extension normalization (svg+xmlβ†’svg, jpegβ†’jpg, tiffβ†’tif, x-icon/vnd.microsoft.iconβ†’ico) matches standard Word media-part conventions.
  • noChangeAspect is correctly emitted only when explicitly set (default false per Β§20.1.2.2.31) β€” preserved from existing behavior.
  • wp:docPr@id is guarded as positive (OOXML requires id > 0) β€” preserved from existing behavior.

One observation worth flagging (not a spec violation, but a Word interop note): plain SVG data URIs are now exported as primary <a:blip r:embed="..."> targets pointing to .svg media. Strict ECMA-376 doesn't define SVG handling; MS Word's convention is a PNG fallback in <a:blip> plus an <a:extLst>/<a:ext uri="{96DAC541-7B7A-43D3-8B79-37D633B846F1}">/<asvg:svgBlip> extension referencing the SVG. The current export is valid OOXML (image rel type allows any image part), but older or non-Word consumers may not render the SVG without the fallback. Outside the strict scope of "spec compliance," though β€” passing.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 881cda9726

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread shared/url-validation/index.js Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 27 files

Linked issue analysis

Linked issue: SD-3116: Bug: image set as presetContent in SDT field does not persist on reload

Status Acceptance criteria Notes
βœ… A block SDT containing supported rich/image placeholder content retains that content after template export and reload. The PR adds DOCX exporter changes that create media targets for data-URI images and ensures relationships are written; tests perform an export β†’ re-import and assert the block contains an image media part and the reopened document has the image.
βœ… A block SDT containing text placeholder content retains that content after document export and reload (sanity test). Round-trip tests insert block text SDTs, export and re-import, and assert text content is preserved.
βœ… An inline SDT containing text placeholder content retains that content after document export and reload (sanity test). The same round-trip suite covers inline structured content insertion and verifies the reopened inline SDT text matches the original.
βœ… The saved DOCX contains the block SDT wrapper and the expected child content under w:sdtContent (i.e., the media child is written under w:sdtContent). Tests inspect the serialized document.xml and check for w:sdt/w:sdtContent and that it contains drawing/blip entries for the image; exporter now maps data-URI to word/media/* package paths and writes relationships.
βœ… Reopening the saved template shows the same visible signature/image content inside the block SDT. The round-trip tests re-import the exported DOCX into a new editor instance and assert the reopened structured-content block has an image node with a word/media/* src and same alt/size attributes. There is also a paint-only test that repaints saved model and asserts the painted DOM img uses the original data-URI.
βœ… Regression coverage verifies the block remains and the image/rich content remains across export/import or model repaint. The PR adds an extensive test suite covering preset-content insertion β†’ paint β†’ export β†’ re-import for SVG (base64 and percent-encoded) and PNG, along with unit/browser tests for painter, image registration, importer helpers, and exporter behavior to prevent regressions.

Re-trigger cubic

Validate base64 payloads in isValidImageDataUrl before treating them
as renderable or exportable. Previously any data:image/*;base64 URL
was accepted regardless of payload contents, so malformed sources
could reach the DOCX exporter and produce unreadable media parts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants