pnpm add -D tutorial-forge tutorial-forge-cli playwright
npx playwright install chromiumYou also need ffmpeg and ffprobe (≥ 6) on PATH. Verify everything with:
pnpm exec tutorial-forge doctorRun this from your project directory so doctor can read forge.config.ts and probe that the app at your adapter's baseURL is up — the most common render failure is simply forgetting to start the dev server. Add --setup (tutorial-forge doctor --setup) to go one step further and actually run adapter.setup once, tearing it down afterward — this catches a reachable-but-mispointed server (e.g. the dev server on the wrong database) that would otherwise pass reachability and then fail every render at sign-in.
pnpm note: if
pnpm exec tutorial-forge …trips pnpm's ignored-builds pre-check, invoke the bin directly (./node_modules/.bin/tutorial-forge …) or approve the build withpnpm approve-builds.
The binary is also installed under the shorter alias tforge.
Create forge.config.ts at your repo root (or wherever you run tutorial-forge):
import { defineConfig, ElevenLabs } from 'tutorial-forge';
import { myAdapter } from './e2e/my-adapter';
export default defineConfig({
adapter: myAdapter, // gets your app into a known state — see adapters.md
tts: ElevenLabs({ voiceId: 'daniel' }), // or OpenAITTS(), Piper(), SilentProvider()
outDir: 'tutorials/dist',
tutorials: ['tutorials/**/*.tutorial.ts'],
viewport: { width: 1920, height: 1080 },
});CLI flags override config; config overrides built-in defaults.
The forge CLI automatically loads a .env file from the directory you run it in (variables already set in your shell take precedence). Put provider credentials there and keep it out of version control — this repo's .gitignore already excludes .env.
# .env
ELEVENLABS_API_KEY=sk_...
# or
OPENAI_API_KEY=sk-...| Provider | Config | Credentials | Notes |
|---|---|---|---|
ElevenLabs({ voiceId }) |
required voiceId |
ELEVENLABS_API_KEY (or apiKey option) |
Best quality; see key setup below |
OpenAITTS({ voice }) |
optional voice, default alloy |
OPENAI_API_KEY (or apiKey option) |
|
Piper({ model }) |
path to a .onnx voice model |
none | Local/offline; needs the piper CLI installed |
SilentProvider() |
— | none | Deterministic silence; for tests and CI |
Narration audio is cached by content hash of provider + voice + text (~/.cache/tutorial-forge/tts by default), so each narration line is synthesized exactly once — re-renders and UI-only changes never hit the TTS API.
In the ElevenLabs dashboard, go to Settings → API Keys → Create API Key:
- Name it after your project (e.g.
tutorial-forge). - Enable Restrict Key.
- Under Endpoints, set Text to Speech → Access — leave everything else at No Access. The pipeline only ever calls
POST /v1/text-to-speech/{voiceId}; it does not need Voices, History, or any other endpoint (you pass the voice ID inforge.config.ts). - Optionally set a per-period credit limit as a spending cap; thanks to the cache, steady-state usage is only the lines you change.
To pick a voice, copy its ID from the ElevenLabs voice library and pass it as ElevenLabs({ voiceId: '...' }).
tutorial-forge list # discovered tutorials: id, title, step count
tutorial-forge render # render everything
tutorial-forge render --only my-id # iterate on one tutorial
tutorial-forge render --headed # watch the browser while recording
tutorial-forge render --phase post # re-merge without re-recording (uses .forge/<id>/)
tutorial-forge clean --cache # remove work dirs and the TTS cacheOutput: <outDir>/<tutorial-id>.mp4 plus a sidecar .srt (set subtitles: 'burn' to burn them in, 'off' to skip). Burned captions are rendered by the browser and composited with ffmpeg's built-in overlay filter — no libass needed, works on any ffmpeg build, and styles via captionStyle: { fontSizePx, maxWidthPx, bottomMarginPx } in config.
Zoom-on-callout: pass --zoom (or set zoom: true / zoom: { factor: 1.5 } in config) to smoothly zoom toward each click target and back out — the camera leads the click, holds through what it reveals, then releases. Composited in post from the timing manifest, so it adds nothing to recording time.
GIF export: pass --gif to also write an animated GIF next to the MP4 — fps-downsampled, palette-optimized, with narration captions burned in (GIFs are silent). Excerpt a step range with --gif-steps open-modal..create-event, or configure via gif: { widthPx, fps, captions, steps }. Perfect for READMEs and social posts.
Idle speed-up: pass --idle-speedup (or idleSpeedup: true / { maxIdleMs: 2000, speed: 3 }) to fast-forward narration-free waits — spinners, slow loads, long silent steps. Narration playback and click choreography always stay at 1x; audio offsets and subtitle cues are remapped to the shortened timeline automatically.
Two capture implementations, selected with --recorder or recorder: in config:
video(default) — Playwright'srecordVideo. The recording clock is aligned in post by detecting a calibration flash painted at the start.screencast— CDP frame capture with explicit per-frame timestamps: exact clock alignment by construction (no flash), and frames only arrive when content changes. Chromium-only (which the pipeline already is).
Output should be visually identical; screencast is the more precise mechanism and the likely future default.
When a step fails, the render throws a StepError naming the tutorial and step, the work dir is kept, and two artifacts are always written: a screenshot of the page at failure and the recent browser console/pageerror/requestfailed log.
For real investigation, re-run with --debug:
tutorial-forge render --only my-tutorial --debugDebug mode additionally records a full Playwright trace (trace.zip — open with npx playwright show-trace), writes the complete console log, and captures before/after screenshots for every step under .forge/<id>/steps/. Debug renders are slower and not for production output.
The TTS cache (~/.cache/tutorial-forge/tts by default, configurable via ttsCacheDir) is content-hashed by provider + narration text — cache it in CI and unchanged narration lines never hit the TTS API. Headless rendering needs a sane font set in containers: install fonts-liberation and fonts-noto-color-emoji.
The CLI is a thin shell over the library:
import { render, SilentProvider } from 'tutorial-forge';
import myTutorial from './tutorials/my.tutorial';
const result = await render(myTutorial, myAdapter, {
tts: SilentProvider(),
output: 'dist/my.mp4',
});
console.log(result.outputDurationMs, result.manifest);