diff --git a/docs/getting-started.md b/docs/getting-started.md
index 1d163a5..75e1c02 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -77,6 +77,8 @@ tutorial-forge render                    # render everything
 tutorial-forge render --only my-id      # iterate on one tutorial
 tutorial-forge render --headed          # watch the browser while recording
 tutorial-forge render --phase post      # re-merge without re-recording (uses .forge/<id>/)
+tutorial-forge render --out-dir dist    # override config.outDir for this run
+tutorial-forge render --concurrency 2   # cap parallel TTS synthesis (overrides config.ttsConcurrency)
 tutorial-forge clean --cache            # remove work dirs and the TTS cache
 ```
 
diff --git a/ISSUE-ACTION-REPORT.md b/docs/reviews/2026-06-13-issue-action-report.md
similarity index 100%
rename from ISSUE-ACTION-REPORT.md
rename to docs/reviews/2026-06-13-issue-action-report.md
diff --git a/docs/reviews/2026-06-13-pedagogy-review.md b/docs/reviews/2026-06-13-pedagogy-review.md
new file mode 100644
index 0000000..8428cfd
--- /dev/null
+++ b/docs/reviews/2026-06-13-pedagogy-review.md
@@ -0,0 +1,207 @@
+# Pedagogy review — tutorial-forge (2026-06-13)
+
+Reviewer: instructional-designer (advisory). Scope: do the docs and authoring model help
+authors produce tutorials that *teach*, and where should the tool go next to make the
+*tutorials it produces* more effective at training people on complex software? Video craft
+and factual accuracy are out of scope (audited separately). Recommendations go to the
+product-manager; no issues filed.
+
+Reviewed: README.md, docs/getting-started.md, docs/writing-tutorials.md, docs/adapters.md,
+packages/example-app/tutorials/getting-started.tutorial.ts, tutorial-forge-spec.md (context).
+
+Evaluated against established multimedia-learning and cognitive-load principles
+(Mayer's segmenting, signaling, modality, redundancy, coherence, pre-training,
+temporal-contiguity; Sweller's intrinsic/extraneous/germane load; Merrill's
+worked-example→practice progression).
+
+---
+
+## Headline
+
+tutorial-forge is **engineering-excellent and pedagogy-silent.** The *machine* embodies
+several of the strongest evidence-based learning principles almost by accident of good
+engineering — but the *documentation* teaches authors only how to drive the machine, not
+how to write something that teaches. The result: the tool makes it easy to produce a
+technically-correct video and gives almost no help producing an *instructionally* good one.
+The single biggest lever is that step boundaries are already first-class in the timing
+manifest, so most high-impact pedagogy features are cheap additions, not rewrites.
+
+---
+
+## Part 1 — Pedagogy review of the docs
+
+### What the architecture gets right (credit where due)
+
+- **Narration-first pacing honors temporal contiguity and coherence.**
+  writing-tutorials.md:59-61 — narration is synthesized and measured first, and each step
+  holds on screen at least as long as its line. Audio and the corresponding visual
+  co-occur (temporal-contiguity principle), and there's no dead air or rushed-past visual.
+  This is a genuinely strong pedagogical foundation that most screen-recording tools get
+  wrong (they retime voiceover onto a fixed recording).
+- **Signaling is first-class and well-built.** Fake cursor, click callouts, `--zoom`
+  toward targets, and `opts.focus` to anchor attention (writing-tutorials.md:30, 44-54;
+  getting-started.md:85) are a textbook implementation of the signaling/cueing principle —
+  directing the learner's eye to the essential element and suppressing everything else.
+- **Extraneous load reduction.** `--idle-speedup` (getting-started.md:89) removes dead
+  waiting; pre-roll trimming removes setup. Both cut extraneous cognitive load.
+- **Modality principle honored (silently).** Narration is audio, not on-screen prose, so
+  the visual channel isn't competing with on-screen text. Good — but unstated, so an author
+  who starts pasting paragraphs of on-screen text wouldn't know they're breaking it.
+- **Localization respects speech-rate differences** (writing-tutorials.md:155): each
+  language re-derives pacing from its actual narration duration. Equity *and* correctness.
+- **Determinism / seeded state** (adapters.md:119) gives consistent, repeatable worked
+  examples — the same demo every time.
+
+### Where the docs steer authors away from teaching (the gaps)
+
+1. **Narration guidance is one sentence — the highest-leverage teaching surface is
+   undocumented.** writing-tutorials.md:29 defines narration as "the line spoken over this
+   step. Plain text (no SSML). May be `''`." That's it. There is no guidance on *how to
+   write a line that teaches*: explain the *why* not just the *what*, one idea per step,
+   use signaling language ("notice…", "here…"), write for the ear (short sentences), not
+   for reading. Ironically the example models this well —
+   "Give the event a descriptive name. **This is what attendees will see on their
+   invitations**" (getting-started.tutorial.ts:30) explains *why* — but the docs never
+   extract that as a principle, so it's luck, not guidance.
+
+2. **No guidance on step granularity / segmenting** — the single most impactful
+   multimedia principle. The docs treat a "step" as a purely mechanical unit (a narration
+   line + an action) and never tell authors how much one step should carry. The example
+   shows the inconsistency this produces: "open the Events page" is one click
+   (getting-started.tutorial.ts:14-20) while the Settings step does nav + heading-wait +
+   checkbox in a single step/line (getting-started.tutorial.ts:60-67). Segmenting theory
+   says one conceptual chunk per segment, learner able to absorb before the next. Authors
+   get no help chunking a complex procedure.
+
+3. **No learning-objective / advance-organizer pattern.** The example opens with a decent
+   advance organizer in narration ("In this short tour, we will create your first event and
+   adjust a workspace setting," getting-started.tutorial.ts:5) and closes with a recap
+   ("Your event is drafted, your workspace is configured…", :70). Both are good practice —
+   and both are invisible to the docs. There's no `Tutorial.objectives`, no documented
+   "open with what they'll be able to do, close with a recap" pattern. Pre-training and
+   advance-organizer effects go unsupported.
+
+4. **Captions are framed purely as subtitles, with no redundancy nuance.**
+   getting-started.md:83 — captions are verbatim narration, burn or sidecar. For
+   accessibility this is correct and should stay the default. But the docs present no
+   awareness that verbatim captions + identical audio is the *redundancy* condition (can
+   add load for sound-on learners), and offer no alternative (e.g. key-term captions). The
+   tension between the redundancy principle and accessibility is real and worth naming.
+
+5. **The README frames the value proposition as maintenance, not learning.** "How it
+   compares" (README.md:62-67) and the whole pitch center on *freshness* — tutorials-as-
+   code, regenerate in CI, never go stale. That's a real and excellent DevEx win, but it
+   reveals the project's center of gravity: "keep the video from going stale," not "make a
+   video that teaches well." Nothing here is wrong; it's the strategic gap that Part 2
+   addresses. A maintained-but-mediocre tutorial is still mediocre.
+
+### Inherent limit (not a doc bug, but the ceiling)
+
+The output is a linear MP4: a **worked example** the learner *watches*. There is no
+practice, no retrieval, no learner pacing within the video, no faded guidance — all of
+which the evidence says drive durable skill acquisition for procedural software tasks.
+This is the natural ceiling of "video," and it's where the most ambitious Part 2 ideas aim.
+
+---
+
+## Part 2 — New directions (prioritized by pedagogical impact)
+
+Architecture in play: narration-first pipeline; a **timing manifest** with first-class
+per-step boundaries, action windows, audio durations, and callout boxes; a post-stage
+ffmpeg filter graph; the adapter model. Step boundaries already being in the manifest is
+the key lever — several high-impact features are new *consumers* of data that already
+exists.
+
+| # | Direction | Impact | Feasibility | Why it teaches |
+|---|---|---|---|---|
+| 1 | **Chapters / segmenting** | High | High | Segmenting principle — the strongest, cheapest win |
+| 2 | **Teaching-narration guidance + germane lints** | High | High | Multiplies quality of every tutorial authored |
+| 3 | **Objective + recap cards (first-class)** | High | Medium | Advance-organizer, pre-training, summary |
+| 4 | **Interactive HTML output w/ knowledge checks** | High | Low | Retrieval practice / generative learning — highest ceiling |
+| 5 | **Caption modes (verbatim vs key-term)** | Medium | High | Resolves redundancy vs accessibility |
+| 6 | **Accessibility-as-pedagogy pass** | Medium | Medium | Universal design helps all learners |
+| 7 | **Effectiveness instrumentation** | Medium | Low (core) | Closes the loop: did it actually teach? |
+| 8 | **Faded-practice "your turn" mode** | High | Low | Watch→do; the spec is already the checker |
+
+### 1. Chapters / segmenting — *do this first*
+The manifest already has per-step start/end. Emit **MP4 chapter markers** (ffmpeg metadata
+/ chapter file) and/or a **WebVTT chapters track** plus a YouTube-style timestamp list in
+output. Optionally let authors group steps into named sections (a `section?: string` on
+step, or a `chapter('name', [...steps])` grouping). This gives learners navigation and
+self-pacing *within* a single video — the segmenting effect, one of the largest and most
+reliable in the literature — at near-zero cost. Highest impact-to-effort ratio in the list.
+
+### 2. Teaching-narration guidance + germane lints
+Add a "Writing narration that teaches" section to writing-tutorials.md (right after the
+one-line narration bullet at :29): one idea per step; explain *why*, not only *what*; use
+signaling language; open with an objective, close with a recap; write for the ear. Then
+back the most objective of these with **optional load-time warnings** — the validation
+infra already runs at spec load (writing-tutorials.md:182-183): warn on over-long
+narration per step (overload), steps bundling multiple unrelated actions (segmenting), and
+a missing intro/recap step. Cheap; multiplies the quality of every tutorial anyone authors.
+
+### 3. Objective + recap cards as first-class artifacts
+Add `objectives?: string[]` (and/or `summary?`) to the `Tutorial` type and render an
+optional **title/objective card** at the start and a **recap card** at the end. The spec
+already lists intro/outro cards as a "deferred-but-designed-for" post stage
+(tutorial-forge-spec.md:223), and the caption system already renders browser HTML →
+composited frames, so the rendering path exists. Serves the advance-organizer,
+pre-training, and summary principles, and makes the good practice the example already
+models (getting-started.tutorial.ts:5, :70) explicit and reusable rather than incidental.
+
+### 4. Interactive HTML output with knowledge checks — *the high-ceiling bet*
+A pure MP4 can't do retrieval practice, the single biggest driver of durable learning. A
+new **HTML output target** that wraps the rendered video with chapter navigation (from #1)
+and authored **knowledge-check questions between chapters** would add generative/retrieval
+activity — the thing video structurally can't. The v1 spec explicitly excluded HTML demos
+(tutorial-forge-spec.md:21); this is the right time to revisit that as a *complement*, not
+a replacement. Biggest architectural lift here, so frame it as a longer-term initiative —
+but the timing manifest makes the chaptered player itself very feasible; the questions are
+the new authoring surface.
+
+### 5. Caption modes: verbatim vs. key-term
+Offer `captions: 'full' | 'keyterms' | 'off'` (today it's effectively full-or-off,
+getting-started.md:83). `keyterms` surfaces only signaled labels/vocabulary instead of the
+full sentence — reducing redundancy load for sound-on learners while still teaching the
+vocabulary (a pre-training flavor), with `full` remaining the accessible default. Feasible:
+caption text is already per-step.
+
+### 6. Accessibility-as-pedagogy pass
+Guidance + light defaults: never narrate "click here" — name *what* and *where* (also
+serves sound-off and low-vision learners); legible caption defaults (styling already
+exists via `captionStyle`); flag steps whose narration is purely deictic. Universal-design
+moves measurably help *all* learners, not only those who need them.
+
+### 7. Effectiveness instrumentation
+The project today optimizes "the video doesn't go stale" but can't answer "did it teach?"
+If output ships with chapter markers (#1) and an analytics-friendly player (#4), consumers
+can measure per-chapter drop-off, replays, and completion — turning segment boundaries into
+a learning signal. Out of scope for the core MP4 encoder, but the natural strategic
+destination: close the loop from *maintained* to *effective*.
+
+### 8. Faded-practice "your turn" mode — *the striking architectural synergy*
+Procedural skill comes from watch→do, not watch alone. Because a tutorial is *already an
+executable Playwright spec*, the same spec could generate a **guided-practice mode**: run
+the real app, prompt the learner to perform each step themselves, and validate with the
+same locators the demo uses. The spec is simultaneously the demonstration *and* the checker
+— a synergy no recording-based tool can match. Big lift, research-grade, but the highest
+learning ceiling in this list, paired with #4.
+
+---
+
+## Recommendation to the product-manager
+
+- **Now / cheap / high-impact:** #1 (chapters) and #2 (narration guidance + lints). Both
+  are small, ride existing infrastructure (manifest, load-time validation, docs), and lift
+  every tutorial. Recommend filing these as near-term issues.
+- **Next:** #3 (objective/recap cards) and #5 (caption modes) — modest post-stage and
+  config work, clear pedagogical payoff.
+- **Strategic bets (RFC / spike first):** #4 (interactive HTML + knowledge checks),
+  #8 (faded-practice mode), #7 (instrumentation). These move the product from "videos that
+  don't go stale" to "training that demonstrably works," and #4/#8 are where the largest
+  learning gains live. They warrant a design doc before committing.
+
+The throughline: the engine already does the hard, load-bearing pedagogy (pacing,
+signaling, coherence). The cheapest wins are teaching authors to use it well (#2) and
+surfacing the structure the manifest already knows (#1, #3). The biggest wins are adding
+the one thing video fundamentally lacks — learner activity (#4, #8).
diff --git a/docs/reviews/2026-06-13-report-for-john.md b/docs/reviews/2026-06-13-report-for-john.md
new file mode 100644
index 0000000..f15dfcc
--- /dev/null
+++ b/docs/reviews/2026-06-13-report-for-john.md
@@ -0,0 +1,114 @@
+# Report for John — docs cleanup, pedagogy review & backlog triage
+
+**Date:** 2026-06-13
+**Prepared by:** Claude Code, with the product-manager and instructional-designer agents
+
+This session ran end-to-end: a documentation audit → fixes → an instructional-design
+review of the docs → backlog triage of the resulting ideas. Here's everything that
+happened and where it leaves you.
+
+---
+
+## 1. Documentation audit & fixes
+
+A product-manager audit found the **user-facing docs are current and accurate** with the
+shipped 0.10.0 surface (README, the three `docs/` guides, package READMEs, CHANGELOG,
+agents README — no drift). Two stray, non-user-facing artifacts were the only real gaps,
+both now fixed:
+
+| Fix | What changed | Issue |
+|---|---|---|
+| Historical-spec banner | Added a prominent "Historical design document" banner to `tutorial-forge-spec.md` — marks it pre-v0.1, settles final names (`tutorial-forge` / `tutorial-forge-cli` / bin `tutorial-forge`+`tforge`), points to README/docs/CHANGELOG as authoritative | #33 (closed) |
+| Stray review report moved | `ISSUE-ACTION-REPORT.md` → `docs/reviews/2026-06-13-issue-action-report.md` | #34 (closed) |
+| Render-flag coverage | Documented `--out-dir` and `--concurrency` in `docs/getting-started.md` (verified against `render.ts`) | — |
+
+Repo root is now clean: `README.md`, `CHANGELOG.md`, `tutorial-forge-spec.md`.
+**These changes are in the working tree, uncommitted.**
+
+---
+
+## 2. Pedagogy review (instructional-designer)
+
+Full review: [`docs/reviews/2026-06-13-pedagogy-review.md`](2026-06-13-pedagogy-review.md).
+
+**Headline: the engine is pedagogy-strong; the docs are pedagogy-silent.** The pipeline
+already embodies several of the strongest evidence-based learning principles by good
+engineering — but the documentation teaches authors how to drive the machine, not how to
+write something that *teaches*.
+
+**Already right (credit):** narration-first pacing (temporal contiguity + coherence);
+cursor/callouts/zoom/focus (signaling); idle-speedup + pre-roll trim (extraneous-load
+reduction); per-language pacing (equity).
+
+**Gaps:**
+1. Narration guidance is *one sentence* — the highest-leverage teaching surface is
+   undocumented.
+2. No guidance on **step granularity / segmenting** (the most impactful principle); the
+   example tutorial is itself inconsistent.
+3. No **objective / recap** pattern documented (the example models both but never names
+   them).
+4. Captions framed purely as verbatim subtitles — no redundancy-vs-accessibility nuance.
+5. The README pitch centers on *freshness/maintenance*, not *learning* — the strategic gap.
+
+**Inherent ceiling:** the output is a linear MP4 — a worked example you *watch*. No
+practice, no retrieval, no learner pacing. That ceiling is where the boldest ideas aim.
+
+---
+
+## 3. Backlog triage (product-manager)
+
+The PM spot-checked the review's feasibility claims against the code (all three held: the
+manifest carries per-step boundaries `types.ts:188-203`; load-time validation exists
+`spec.ts:21-60`; intro/outro cards are pre-designed `spec.md:223` + the card-render path
+exists in `post/captions.ts`) and **filed 5 issues**:
+
+| # | Title | Bucket | Labels |
+|---|---|---|---|
+| **#35** | Chapters / segmenting from per-step manifest | **Now** | enhancement |
+| **#36** | Teaching-narration guidance + germane load-time lints | **Now** | documentation, enhancement |
+| **#37** | Objective + recap cards as first-class artifacts | **Next** | enhancement |
+| **#38** | Caption modes: verbatim vs. key-term | **Next** | enhancement |
+| **#39** | RFC / roadmap: from "maintained" to "effective" | **Later** | enhancement, question |
+
+**PM judgment calls:**
+- The three strategic bets (interactive HTML output, faded-practice mode, effectiveness
+  instrumentation) were **merged into one RFC/roadmap issue (#39)** — they're
+  interdependent (all presuppose a shared chaptered web player), not parallel, and reduce
+  to one upstream decision: *does the project expand beyond video?*
+- Accessibility-as-pedagogy (#6 in the review) was **folded into #36** rather than filed
+  separately (heavy overlap; `captionStyle` already exists). Can be split out if you want
+  it tracked as its own audit workstream.
+- An API correction was baked into the issues: the review said `captions:`; the real field
+  is `subtitles: 'burn' | 'sidecar' | 'off'` (`types.ts:130`).
+
+**Recommended sequencing:** #35 chapters first (it's the substrate for #37 card timing,
+#39's HTML player, and #39's analytics) → #36 narration docs+lints in parallel (no code
+dependency) → #37 cards → #38 caption modes → #39 only after a design doc.
+
+---
+
+## 4. Decisions needed from you (these block work)
+
+1. **Does tutorial-forge's scope expand beyond video?** Gates #39. An HTML player and a
+   live practice runner both reverse the explicit v1 non-goal "No interactive/HTML demos"
+   (`spec.md:21`). Product-identity call — nothing in #39 starts until you decide.
+2. **Lint default-on vs. opt-in** (#36). PM rec: over-long-narration on by default
+   (objective, high signal); missing-intro/recap behind opt-in strict mode (heuristic,
+   noisier).
+3. **Caption-modes API shape** (#38). PM rec: a new orthogonal option
+   (`captionText: 'verbatim' | 'keyterms'`) keeping delivery (`subtitles`) separate from
+   content, vs. overloading the `subtitles` enum.
+4. **Cards: visual-only or narrated?** (#37). PM rec: visual-only in v1 — authors already
+   speak the organizer in step-1 narration, so narrating cards too would manufacture the
+   redundancy the review warns against.
+
+---
+
+## Suggested next step
+
+If you're happy with the direction, the cheapest high-impact move is to greenlight
+**#35 (chapters)** and **#36 (narration guidance + lints)** — both ride existing
+infrastructure and lift every tutorial. #39 is parked behind decision #1.
+
+Also pending: the documentation fixes in §1 are **uncommitted** — say the word and I'll
+commit them (and these review/report files).
diff --git a/tutorial-forge-spec.md b/tutorial-forge-spec.md
index d333f15..34094c3 100644
--- a/tutorial-forge-spec.md
+++ b/tutorial-forge-spec.md
@@ -1,6 +1,8 @@
 # tutorial-forge — Specification
 
-> **Working name:** `tutorial-forge` (npm scope TBD, e.g. `@jb/tutorial-forge`). Rename freely; nothing below depends on the name.
+> **⚠️ Historical design document.** This is the pre-v0.1 design spec, kept for context on *why* the system is shaped the way it is. It is **not** current user documentation and some details have since changed (final package names, the full CLI flag set, the dropped Umami framing). For authoritative, up-to-date guidance see **[README.md](README.md)**, the **[docs/](docs/)** guides (`getting-started`, `writing-tutorials`, `adapters`), and **[CHANGELOG.md](CHANGELOG.md)**. Where this document and those disagree, those win.
+>
+> **Names as shipped:** library `tutorial-forge`, CLI package `tutorial-forge-cli`, binary `tutorial-forge` (alias `tforge`). The original spec used a `forge` binary throughout; read the `forge <cmd>` examples below as `tutorial-forge <cmd>`.
 
 ## 1. What this is