Skip to content

Commit e230889

Browse files
andiwandclaude
andcommitted
PDF stage 3.6: design — Type3 glyphs + non-embedded fonts
Seed the stage-3.6 branch. Type3 char procs -> SVG via a minimal path->SVG capability pulled forward from stage 4; non-embedded standard-14 substitution + AFM widths (closes stage 2's deferred item). Stacked on 3.5. Implementation follows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
1 parent f9351ef commit e230889

1 file changed

Lines changed: 84 additions & 0 deletions

File tree

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# PDF stage 3.6 — Type3 glyphs + non-embedded fonts
2+
3+
Design for the final stage-3 sub-stage. Status: **design draft** (no
4+
implementation yet — this PR seeds the branch). Roadmap entry lives in
5+
[`src/odr/internal/pdf/AGENTS.md`](../../../src/odr/internal/pdf/AGENTS.md).
6+
7+
Two loose ends that don't fit the "read a font program, wrap to OTF" pipeline:
8+
fonts whose glyphs are **drawing procedures** (Type3) and fonts with **no
9+
embedded program at all** (the standard 14 + common substitutes). Closes stage
10+
2's deferred AFM-widths item.
11+
12+
## Part A — Type3 fonts
13+
14+
Type3 glyphs are mini content streams, not outlines, so they cannot go through
15+
`@font-face`. They need path → SVG, which otherwise belongs to stage 4; a
16+
**minimal path → SVG capability is pulled forward here** (decision 2026-06-19)
17+
rather than waiting on the full graphics stage.
18+
19+
### What gets read
20+
21+
A Type3 font dictionary carries:
22+
- **`/CharProcs`** — glyph name → content stream (drawing procedure).
23+
- **`/Encoding`** — code → glyph name (Differences).
24+
- **`/FontMatrix`** — glyph space → text space (Type3 fonts set their own, not
25+
the implicit 1/1000).
26+
- **`/FontBBox`**, **`/Resources`** — resources referenced by the procs.
27+
28+
### Rendering
29+
30+
Each char proc is already tokenized by the operator parser. The proc begins with
31+
`d0` (colored) or `d1` (uncolored — glyph takes the fill color, bbox follows).
32+
Run it through a small **path → SVG** emitter covering the path-construction +
33+
painting subset (`m`/`l`/`c`/`v`/`y`/`re`/`h`, `f`/`F`/`f*`/`S`/`s`/`B`/`b`,
34+
`W`/`W*` clip, `cm`, color ops) — the minimal slice of stage 4. Each glyph
35+
becomes an SVG `<symbol>`/`<g>` in glyph space; the HTML layer instantiates it
36+
per show at the text transform (CTM × Tm × FontMatrix, font size folded in),
37+
sized by `/FontMatrix`.
38+
39+
Type3 has no glyph program → no PUA re-encode, no reverse map. Unicode for
40+
selection still comes from the stage-1 chain (`/ToUnicode` / `/Encoding`); the
41+
dual-layer model holds (SVG glyph layer + transparent Unicode layer).
42+
43+
This path → SVG emitter is written to be **reused by stage 4** for page-level
44+
vector content.
45+
46+
## Part B — non-embedded fonts
47+
48+
A font with no `/FontFile*`: substitute and metric-match rather than render true
49+
glyphs.
50+
51+
- **Substitution** — map the standard 14 (Helvetica/Times/Courier families +
52+
Symbol/ZapfDingbats) and common names to CSS `font-family` fallback stacks
53+
(serif/sans/mono by flags + name heuristics; bold/italic from the descriptor
54+
`/Flags` and the name). No `@font-face`.
55+
- **Metrics** — drive placement from the PDF `/Widths` when present; for the
56+
standard 14 (which usually ship no `/Widths`) use the **AFM advance-width
57+
tables**, closing stage 2's deferred item. AFM widths become a generated data
58+
table (`tools/pdf/generate_*`), like the encoding / AGL tables in
59+
`pdf_encoding_data`.
60+
- Glyph shapes are the browser's fallback font — display fidelity is bounded
61+
here by design (no program to embed); selection/search are exact.
62+
63+
## Module touchpoints
64+
65+
- `internal/svg/` (or `internal/font/type3_*`) — the minimal path → SVG emitter
66+
(new, shared with stage 4).
67+
- `html/pdf_file.cpp` — Type3 SVG glyph emission; non-embedded substitution +
68+
font-family stacks.
69+
- `pdf_encoding_data` (or sibling) — generated AFM width tables for the standard 14.
70+
- `pdf_document_element` — Type3 `/CharProcs`/`/FontMatrix`/`/Resources` on
71+
`Font`; substitution facts for the non-embedded path.
72+
73+
## Scope / non-goals
74+
75+
- Only the path-painting operator subset needed by real Type3 procs; full stage-4
76+
graphics (shadings, images, patterns) stays in stage 4.
77+
- No font synthesis for non-embedded fonts — substitution + metrics only.
78+
79+
## Tests
80+
81+
Type3: an inline content-stream font (a `d1` proc drawing a rectangle) asserting
82+
the emitted SVG path and its placement transform. Non-embedded: standard-14 AFM
83+
width lookup through `advance_width`, and the substitution family-stack mapping
84+
for a few representative names/flag combinations.

0 commit comments

Comments
 (0)