|
| 1 | +# PDF stage 3.6 — Type3 glyphs + non-embedded fonts |
| 2 | + |
| 3 | +Design for the final stage-3 sub-stage. Status: **design draft** (no |
| 4 | +implementation yet — this PR seeds the branch). Roadmap entry lives in |
| 5 | +[`src/odr/internal/pdf/AGENTS.md`](../../../src/odr/internal/pdf/AGENTS.md). |
| 6 | + |
| 7 | +Two loose ends that don't fit the "read a font program, wrap to OTF" pipeline: |
| 8 | +fonts whose glyphs are **drawing procedures** (Type3) and fonts with **no |
| 9 | +embedded program at all** (the standard 14 + common substitutes). Closes stage |
| 10 | +2's deferred AFM-widths item. |
| 11 | + |
| 12 | +## Part A — Type3 fonts |
| 13 | + |
| 14 | +Type3 glyphs are mini content streams, not outlines, so they cannot go through |
| 15 | +`@font-face`. They need path → SVG, which otherwise belongs to stage 4; a |
| 16 | +**minimal path → SVG capability is pulled forward here** (decision 2026-06-19) |
| 17 | +rather than waiting on the full graphics stage. |
| 18 | + |
| 19 | +### What gets read |
| 20 | + |
| 21 | +A Type3 font dictionary carries: |
| 22 | +- **`/CharProcs`** — glyph name → content stream (drawing procedure). |
| 23 | +- **`/Encoding`** — code → glyph name (Differences). |
| 24 | +- **`/FontMatrix`** — glyph space → text space (Type3 fonts set their own, not |
| 25 | + the implicit 1/1000). |
| 26 | +- **`/FontBBox`**, **`/Resources`** — resources referenced by the procs. |
| 27 | + |
| 28 | +### Rendering |
| 29 | + |
| 30 | +Each char proc is already tokenized by the operator parser. The proc begins with |
| 31 | +`d0` (colored) or `d1` (uncolored — glyph takes the fill color, bbox follows). |
| 32 | +Run it through a small **path → SVG** emitter covering the path-construction + |
| 33 | +painting subset (`m`/`l`/`c`/`v`/`y`/`re`/`h`, `f`/`F`/`f*`/`S`/`s`/`B`/`b`, |
| 34 | +`W`/`W*` clip, `cm`, color ops) — the minimal slice of stage 4. Each glyph |
| 35 | +becomes an SVG `<symbol>`/`<g>` in glyph space; the HTML layer instantiates it |
| 36 | +per show at the text transform (CTM × Tm × FontMatrix, font size folded in), |
| 37 | +sized by `/FontMatrix`. |
| 38 | + |
| 39 | +Type3 has no glyph program → no PUA re-encode, no reverse map. Unicode for |
| 40 | +selection still comes from the stage-1 chain (`/ToUnicode` / `/Encoding`); the |
| 41 | +dual-layer model holds (SVG glyph layer + transparent Unicode layer). |
| 42 | + |
| 43 | +This path → SVG emitter is written to be **reused by stage 4** for page-level |
| 44 | +vector content. |
| 45 | + |
| 46 | +## Part B — non-embedded fonts |
| 47 | + |
| 48 | +A font with no `/FontFile*`: substitute and metric-match rather than render true |
| 49 | +glyphs. |
| 50 | + |
| 51 | +- **Substitution** — map the standard 14 (Helvetica/Times/Courier families + |
| 52 | + Symbol/ZapfDingbats) and common names to CSS `font-family` fallback stacks |
| 53 | + (serif/sans/mono by flags + name heuristics; bold/italic from the descriptor |
| 54 | + `/Flags` and the name). No `@font-face`. |
| 55 | +- **Metrics** — drive placement from the PDF `/Widths` when present; for the |
| 56 | + standard 14 (which usually ship no `/Widths`) use the **AFM advance-width |
| 57 | + tables**, closing stage 2's deferred item. AFM widths become a generated data |
| 58 | + table (`tools/pdf/generate_*`), like the encoding / AGL tables in |
| 59 | + `pdf_encoding_data`. |
| 60 | +- Glyph shapes are the browser's fallback font — display fidelity is bounded |
| 61 | + here by design (no program to embed); selection/search are exact. |
| 62 | + |
| 63 | +## Module touchpoints |
| 64 | + |
| 65 | +- `internal/svg/` (or `internal/font/type3_*`) — the minimal path → SVG emitter |
| 66 | + (new, shared with stage 4). |
| 67 | +- `html/pdf_file.cpp` — Type3 SVG glyph emission; non-embedded substitution + |
| 68 | + font-family stacks. |
| 69 | +- `pdf_encoding_data` (or sibling) — generated AFM width tables for the standard 14. |
| 70 | +- `pdf_document_element` — Type3 `/CharProcs`/`/FontMatrix`/`/Resources` on |
| 71 | + `Font`; substitution facts for the non-embedded path. |
| 72 | + |
| 73 | +## Scope / non-goals |
| 74 | + |
| 75 | +- Only the path-painting operator subset needed by real Type3 procs; full stage-4 |
| 76 | + graphics (shadings, images, patterns) stays in stage 4. |
| 77 | +- No font synthesis for non-embedded fonts — substitution + metrics only. |
| 78 | + |
| 79 | +## Tests |
| 80 | + |
| 81 | +Type3: an inline content-stream font (a `d1` proc drawing a rectangle) asserting |
| 82 | +the emitted SVG path and its placement transform. Non-embedded: standard-14 AFM |
| 83 | +width lookup through `advance_width`, and the substitution family-stack mapping |
| 84 | +for a few representative names/flag combinations. |
0 commit comments