Skip to content

fix(make-pdf): correct CJK rendering (URL sentinel leak, JP-first fonts, CJK quotes)#2012

Draft
rssprivacy-commits wants to merge 1 commit into
garrytan:mainfrom
rssprivacy-commits:cjk-render-fixes
Draft

fix(make-pdf): correct CJK rendering (URL sentinel leak, JP-first fonts, CJK quotes)#2012
rssprivacy-commits wants to merge 1 commit into
garrytan:mainfrom
rssprivacy-commits:cjk-render-fixes

Conversation

@rssprivacy-commits

Copy link
Copy Markdown

Three real defects surface when rendering Simplified-Chinese documents with make-pdf. All reproduced on v1.57.10.0, fixed at source, and verified.

1. Bare URLs corrupt the document (worst)

Any bare URL leaks internal SMARTPANTS_PRESERVED_N sentinels as visible text and leaves the <a>/<p> unclosed — so every heading/paragraph after the URL becomes one giant hyperlink.

Root cause: smartypants.ts carves tags into NUL-delimited SMARTPANTS_PRESERVED_N placeholders, then URL_RE = /\bhttps?:\/\/\S+/g runs. \S+ (a NUL is non-whitespace) greedily swallows the adjacent </a>/</p> placeholders; the single-pass restore then cannot un-nest them.
Fix: stop the URL match at the NUL boundary — [^\s]+.

2. Simplified-Chinese renders in Japanese glyphs

print-css.ts CJK_STACK lists Hiragino Kaku Gothic ProN (Japanese) before any Chinese font, so Chinese text falls back to Japanese glyph variants (直/骨/角/没/别) — pdffonts shows a mix of HiraKakuProN + PingFangSC in one document.
Fix: put PingFang SC, Heiti SC, Noto Sans CJK SC, Source Han Sans SC, Microsoft YaHei first; JP fonts demoted to last resort.

3. Opening quotes after CJK punctuation render as closing quotes

A double/single quote directly after CJK punctuation (:,。(「) rendered as a closing quote. The opening-quote heuristic only treats a quote as opening after whitespace/brackets.
Fix: add CJK punctuation/openers :,。、;!?(【「『〈《 to the opening-quote context. Ideographs are intentionally excluded, so a closing quote after a Han character stays closing.

Verification

  • pdffonts: PingFang-only, zero Hiragino
  • pdftotext: no SMARTPANTS_PRESERVED leak; correct curly quotes
  • Visual render: no anchor bleed, uniform Chinese glyphs
  • bun test make-pdf/test/: 91 pass / 0 fail (unchanged before/after)

Total change: 5 insertions / 5 deletions across 2 files. Opened as draft for maintainer review.

🤖 Generated with Claude Code

@trunk-io

trunk-io Bot commented Jun 15, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

…nts, CJK quotes

Three defects surfaced rendering Simplified-Chinese documents; refined after
an independent two-model code audit (which caught a regression in the first
pass of the quote fix).

1. Bare URLs leaked internal `SMARTPANTS_PRESERVED_N` sentinels into output
   AND left <a>/<p> unclosed (everything after became one hyperlink).
   URL_RE's `\S+` (NUL is non-whitespace) swallowed the adjacent tag
   placeholders; single-pass restore could not un-nest them.
   Fix: stop the URL match at the NUL boundary; additionally strip any
   stray NUL from input at smartypants() entry so text cannot forge a
   placeholder or create NUL-adjacency nesting.

2. The CJK font stack listed Hiragino (Japanese) before any Chinese font,
   so Simplified-Chinese text rendered in Japanese glyph variants
   (直/骨/角/没). Fix: PingFang SC / Noto Sans CJK SC / Source Han Sans SC
   / Microsoft YaHei first; JP fonts demoted to last resort. (Trade-off:
   true Japanese documents now prefer SC glyphs for shared Han; acceptable
   for an SC-primary tool. A lang-attribute-based selector would be the
   fuller fix.)

3. A quote directly after a CJK colon or opening bracket (:(【「『〈《)
   is now treated as opening. Sentence/clause-ending punctuation
   (,。、;!?) is deliberately excluded — a quote after those is usually
   a CLOSING quote (Chinese puts the period inside: 。"), and including
   them flipped closing quotes to opening.

Verified: pdffonts PingFang-only; pdftotext no sentinel leak, correct
opening AND closing quotes (他说:"你好。" closes correctly); visual render
no anchor bleed. make-pdf/test: 91 pass / 0 fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant