diff --git a/CLAUDE.md b/CLAUDE.md index 4c8b6990a..b8fa6f199 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -67,4 +67,4 @@ will not match CI. Use it only for local-only experimentation. ## Skills -Composition authoring (not repo development) is guided by skills installed via `npx skills add heygen-com/hyperframes`. See `skills/` for source. Invoke `/hyperframes`, `/hyperframes-cli`, `/hyperframes-registry`, `/tailwind`, or `/gsap` when authoring compositions. Use `/tailwind` for projects created with `hyperframes init --tailwind` so agents follow the pinned Tailwind v4 browser-runtime contract instead of Studio's Tailwind v3 setup. Use `/animejs`, `/css-animations`, `/lottie`, `/three`, or `/waapi` when a composition uses those first-party runtime adapters. Invoke `/hyperframes-media` for asset preprocessing (TTS narration, audio/video transcription, background removal for transparent overlays) — these commands have their own skill so the CLI skill stays focused on the dev loop. When a user provides a website URL and wants a video, invoke `/website-to-hyperframes` — it runs the full 7-step capture-to-video pipeline. +Composition authoring (not repo development) is guided by skills installed via `npx skills add heygen-com/hyperframes`. See `skills/` for source. Invoke `/hyperframes`, `/hyperframes-cli`, `/hyperframes-registry`, `/tailwind`, or `/gsap` when authoring compositions. Use `/tailwind` for projects created with `hyperframes init --tailwind` so agents follow the pinned Tailwind v4 browser-runtime contract instead of Studio's Tailwind v3 setup. Use `/animejs`, `/css-animations`, `/lottie`, `/three`, or `/waapi` when a composition uses those first-party runtime adapters. Invoke `/hyperframes-media` for asset preprocessing (TTS narration, audio/video transcription, background removal for transparent overlays) — these commands have their own skill so the CLI skill stays focused on the dev loop. When a user provides a website URL and wants a video, invoke `/website-to-hyperframes` — it runs the full 7-step capture-to-video pipeline. When creating a PR with visual/UI changes, invoke `/pr-to-hyperframes` to generate a short walkthrough video and embed it in the PR description — reviewers see the changes in motion instead of reading diffs. diff --git a/skills/pr-to-hyperframes/.gitignore b/skills/pr-to-hyperframes/.gitignore new file mode 100644 index 000000000..6c7148c93 --- /dev/null +++ b/skills/pr-to-hyperframes/.gitignore @@ -0,0 +1,2 @@ +tmp/ +out/ diff --git a/skills/pr-to-hyperframes/SKILL.md b/skills/pr-to-hyperframes/SKILL.md new file mode 100644 index 000000000..b668af8ca --- /dev/null +++ b/skills/pr-to-hyperframes/SKILL.md @@ -0,0 +1,454 @@ +--- +name: pr-to-hyperframes +description: | + Create a narrated video walkthrough of a pull request with code slides, diff visualization, and audio narration. Pulls branding from the repo automatically. Use when: (1) the user asks for a PR walkthrough, PR video, or demo video, (2) you're about to create a PR with visual/UI changes and want to suggest a video, (3) the user says "make a PR video", "add a walkthrough", "record a demo for this PR". Triggers on: PR creation with visual diffs, explicit walkthrough requests, or when `gh pr create` is about to run on a branch with UI changes. +--- + +# PR walkthrough video + +Create a narrated walkthrough video for a pull request. This provides the same benefit as a Loom video from the PR author — walking through the code changes, explaining what was done and why, so reviewers understand the PR quickly. + +**Input:** A GitHub pull request URL, PR number, or the current branch (auto-detects the PR). + +**Output:** An MP4 video at 1280x720 (30 fps) with audio narration, whisper-timed captions, and branded intro/outro slides, saved to `out/pr--walkthrough.mp4`. + +All intermediate files (audio, manifest, scripts) go in `tmp/pr-/` relative to this skill directory. This directory is gitignored. Only the final `.mp4` lives at `out/`. + +Run commands that reference `./scripts` or `./video` from this skill directory. + +## Branding + +**The skill auto-detects branding from the repo.** It never hardcodes project-specific colors, logos, or names. At the start of every run, resolve branding: + +1. **Project name** — read `package.json` → `name` field (strip `@scope/` prefix). Fallback: git remote repo name. Fallback: directory name. +2. **Colors** — read `design.md` or `DESIGN.md` if it exists (check both casings). Extract these specific tokens: `text` (body text color), `background` (page/slide background), `accent` (primary brand color for highlights/pills/progress). Map the closest values you find — design files vary in format. Fallback: neutral palette (`#09090b` text on `#ffffff` background, `#3b82f6` accent). +3. **Fonts** — from `design.md` if present. Extract the body/display font and the monospace/code font. Fallback: `"Geist"` for body, `"Geist Mono"` for code. +4. **Logo** — look for `logo.svg` or `logo.png` in repo root, `public/`, `assets/`, `.github/`. If not found, try `gh api orgs/ --jq .avatar_url` to get the org's GitHub avatar. If nothing found, use the project name as text. +5. **Repo identifier** — parse `git remote get-url origin` for the `org/repo` slug (e.g., `acme/widget`). + +Pass these values to `build.mjs` via a `branding` key in the manifest: + +```json +{ + "branding": { + "name": "widget", + "org": "acme", + "repo": "acme/widget", + "logo": null, + "colors": { + "text": "#09090b", + "background": "#ffffff", + "accent": "#3b82f6", + "caption": "#ffd800", + "captionBg": "#09090b" + }, + "fonts": { + "body": "Geist", + "mono": "Geist Mono" + } + } +} +``` + +The **outro slide** shows the project logo/name and a subtle attribution line: + +``` +[Project Logo or Name] +PR Walkthrough · #NNN +Made with HyperFrames +``` + +The **footer bar** shows the project mark + name on the left, and `PR #NNN` on the right. The **PR body** attribution reads: + +```html +Walkthrough by [HyperFrames](https://hyperframes.dev) — write HTML, render video. +``` + +This is the only HyperFrames mention. Everything else is the repo's own branding. + +## When to suggest (ambient mode) + +When you're about to run `gh pr create` or the user asks you to open a PR, check if the branch diff touches visual files: + +**Visual file patterns:** + +- `*.tsx`, `*.jsx`, `*.vue`, `*.svelte` with JSX/template markup +- `*.css`, `*.scss`, `*.less`, `*.module.css`, `*.styled.*` +- `*.html` files +- Image assets (`*.png`, `*.jpg`, `*.svg`, `*.gif`, `*.webp`) +- Tailwind config, theme files, design tokens +- Storybook stories (`*.stories.*`) +- Component library files + +**Skip suggestion** if the diff is purely backend, tests, docs, or dependency bumps. + +If visual changes are detected, suggest: _"This PR has visual changes — want me to generate a quick walkthrough video to embed in the description?"_ + +If the user declines, proceed with the normal PR. Never push. + +## Philosophy + +**This is a walkthrough from the author's perspective.** The goal is the same as if the PR author sat down with a reviewer and walked them through the changes — showing specific code, explaining what changed and why, in an order that builds understanding. + +This means: + +- **The narration drives everything.** Write the walkthrough narration first, as a continuous explanation of the PR. Then figure out what should be on screen at each moment. +- **Show the code.** The default visual is a code diff or source file. Text slides are the exception (intro, brief transitions, outro), not the rule. +- **Walk through changes in a logical order**, not necessarily file order or commit order — always anchored to concrete code. +- **Explain the "why", not just the "what".** The code on screen shows what changed. The narration adds the reasoning. + +## Workflow + +### Step 1: Understand the PR + +Read the PR commits, diff, and description. Understand the narrative arc: + +- What problem does this solve? +- What's the approach? +- What are the key mechanisms? + +```bash +gh pr view --json title,body,commits +git log main..HEAD --oneline +git diff main..HEAD --stat +``` + +**Skip generated files.** When reading the diff, ignore auto-generated files: + +- Lockfiles (`yarn.lock`, `package-lock.json`, `bun.lockb`) +- Generated docs, API reports, changelogs +- Build output, bundled assets, source maps +- Snapshots, schema dumps + +If unsure whether a file is generated, check for a "DO NOT EDIT" header. Filter these out when picking which files to feature. + +**Resolve branding.** Read `package.json`, `design.md`, check for logos, parse git remote. Build the `branding` object for the manifest. + +### Step 2: Write the narration + +Write the narration as continuous text, broken into logical segments. Each segment is a beat of the walkthrough. Save this as `tmp/pr-/SCRIPT.md`. + +The narration should read like the author explaining the PR to a colleague: "So here's what we're doing... The core problem was X... The approach I took was Y... If you look at this function here..." + +Structure: intro → context/problem → code walkthrough → summary. See **Script structure** below. + +Avoid redundancy between intro and first content segment. + +### Step 3: Generate audio and timestamps + +Generate per-segment audio clips with one TTS call per segment: + +```bash +./scripts/generate-audio.sh narration.json tmp/pr-/ +``` + +**API key:** Sourced from `.env` file (`GEMINI_API_KEY`). + +#### Narration JSON format + +```json +{ + "style": "Read the following walkthrough narration in a calm, steady, professional tone. Speak at a measured pace as if the author of a pull request were walking a colleague through the code changes.", + "voice": "Iapetus", + "slides": [ + "This pull request adds group-aware binding resolution...", + "The core problem was that arrow bindings broke when...", + "If you look at the getBindingTarget method..." + ] +} +``` + +- **`style`** — Voice persona and pacing instructions. Keep it short and specific. +- **`voice`** — Gemini voice name (default: `Iapetus`). +- **`slides`** — Array of narration text, one entry per segment. + +#### How it works + +1. For each segment, the script builds a prompt: style preamble + segment text. +2. One API call to `gemini-2.5-pro-tts` per segment generates a WAV clip directly. +3. Each clip is validated (duration sanity check vs word count) and retried automatically if the output is bad. +4. Leading/trailing silence is trimmed from each clip. + +**Output:** Per-segment audio clips (`audio-00.wav`, ...) and a `durations.json` file mapping each audio filename to its duration in seconds. + +**Dependencies:** ffmpeg / ffprobe. No Python packages required beyond the standard library. + +**Do NOT use** `[pause long]` or `[pause medium]` markup tags — the model may read them aloud literally. + +### Step 4: Write the manifest + +The manifest is a JSON file that describes every slide in the video. It bridges the narration/audio step and the hyperframes renderer. + +**The manifest schema below is the exact format `build.mjs` expects.** Do not invent your own slide structure, nest content in sub-objects, or rename fields. Copy the schema exactly — `build.mjs` reads `slide.type`, `slide.title`, `slide.diff`, `slide.code`, `slide.filename`, `slide.language`, `slide.audio`, `slide.durationInSeconds`, `slide.focus`, `slide.items`, `slide.src`, `slide.subtitle`, and `slide.date` as top-level fields on each slide object. + +Read the `durations.json` from step 3 to get the duration (in seconds) for each audio clip. Then write a `manifest.json` alongside the audio files: + +```json +{ + "pr": 142, + "branding": { + "name": "widget", + "org": "acme", + "repo": "acme/widget", + "logo": null, + "colors": { + "text": "#09090b", + "background": "#ffffff", + "accent": "#3b82f6", + "caption": "#ffd800", + "captionBg": "#09090b" + }, + "fonts": { "body": "Geist", "mono": "Geist Mono" } + }, + "slides": [ + { + "type": "intro", + "title": "Fix canvas z-index layering #142", + "date": "May 15, 2026", + "audio": "audio-00.wav", + "durationInSeconds": 3.2 + }, + { + "type": "diff", + "filename": "packages/editor/editor.css", + "language": "css", + "diff": "@@ -12,7 +12,7 @@\n --z-canvas: 100;\n- --z-canvas-front: 600;\n+ --z-canvas-front: 250;", + "audio": "audio-01.wav", + "durationInSeconds": 25.8 + }, + { + "type": "code", + "filename": "packages/editor/src/Editor.ts", + "language": "typescript", + "code": "function getZIndex() {\n return 250\n}", + "audio": "audio-02.wav", + "durationInSeconds": 13.5 + }, + { + "type": "text", + "title": "Summary", + "subtitle": "Moved canvas-in-front from z-index 600 to 250.", + "audio": "audio-07.wav", + "durationInSeconds": 7.4 + }, + { + "type": "outro", + "durationInSeconds": 3 + } + ] +} +``` + +#### Slide types + +| Type | Required fields | Description | +| --------- | ------------------------------------------------------------ | ------------------------------- | +| `intro` | `title`, `date`, `audio`, `durationInSeconds` | Project name + title + date | +| `diff` | `filename`, `language`, `diff`, `audio`, `durationInSeconds` | Syntax-highlighted unified diff | +| `code` | `filename`, `language`, `code`, `audio`, `durationInSeconds` | Syntax-highlighted source code | +| `text` | `title`, `audio`, `durationInSeconds` | Title + optional `subtitle` | +| `list` | `title`, `items`, `audio`, `durationInSeconds` | Title + numbered items | +| `image` | `src`, `audio`, `durationInSeconds` | Pre-rendered image (fallback) | +| `segment` | `title`, `durationInSeconds` | Silent title card between parts | +| `outro` | `durationInSeconds` | Project branding + attribution | + +#### Animated scroll with `focus` + +For longer diffs or code (more than ~30 lines), the renderer keeps the font at a readable 16px and uses an animated viewport that scrolls between focus points. Add a `focus` array to `diff` or `code` slides: + +```json +{ + "type": "diff", + "filename": "src/lib/Editor.ts", + "language": "typescript", + "diff": "... 60-line diff ...", + "focus": [ + { "line": 3, "at": 0 }, + { "line": 25, "at": 0.4 }, + { "line": 50, "at": 0.8 } + ], + "audio": "audio-03.wav", + "durationInSeconds": 30 +} +``` + +- **`line`** — The line number (0-indexed) to center on screen. +- **`at`** — When to arrive at this position, as a fraction of the slide's duration (0 = start, 1 = end). + +**When to use focus:** Any diff or code slide with more than ~30 lines. +**When to omit focus:** Short diffs (<=30 lines) fit on screen and don't need scrolling. + +#### Writing diff fields + +For `diff` slides, paste the **unified diff** for the relevant hunk(s) — the output of `git diff` for that section, including the `@@` hunk header and `+`/`-`/` ` line prefixes. The renderer parses these to apply green/red backgrounds. + +```bash +git diff main..HEAD -- path/to/file.ts +``` + +Include only the relevant hunks. Strip the `diff --git` and `---`/`+++` header lines — start from `@@`. + +#### Segment title slides + +Insert a **`segment` slide** before each content segment to introduce it — except before the intro and context segments. Each segment slide is **3 seconds of silence** with the segment title centered. + +```json +{ + "type": "segment", + "title": "State machine refactor", + "durationInSeconds": 3 +} +``` + +#### Segment title labels on code/diff slides + +Add a `title` field to `code` and `diff` slides to show a small label in the top-left corner identifying the current segment. Use the same title as the preceding `segment` slide. + +### Step 5: Render the video + +Run the `render.sh` script: + +```bash +./video/render.sh \ + tmp/pr-/manifest.json \ + out/pr--walkthrough.mp4 +``` + +The script: + +1. Copies referenced audio/image files into `video/assets/`. +2. Runs whisper transcription on each audio file → `video/transcripts/audio-NN.json` (idempotent). +3. Runs `build.mjs ` to generate `video/index.html` — a hyperframes composition with timed clips, GSAP timeline for transitions and code-focus pans, and captions derived from whisper transcripts. +4. Lints and renders 1920x1080 frames via `npx hyperframes render`. +5. Downscales to 1280x720 / 30fps and recompresses with ffmpeg (CRF 26 + AAC 96k). + +**Dependencies:** Node.js 22+, ffmpeg, Python 3. `hyperframes` is invoked via `npx --yes`. + +### Step 6: Embed in PR + +After rendering, embed the video in the PR body: + +```bash +# Add to existing PR: +gh pr edit --body "$(gh pr view --json body -q .body) + +## Visual Walkthrough + + + +Walkthrough by [HyperFrames](https://hyperframes.dev) — write HTML, render video. +" +``` + +Or include the video section in the initial `gh pr create --body` when creating a new PR. + +#### Caption sync via whisper + +Captions appear as colored text on a solid dark pill at the bottom. Start/end times come from word-level whisper transcripts grouped into 5-7 word chunks, breaking on natural pauses (>450ms gaps). Whisper may transcribe brand names phonetically — acceptable for captions. + +#### File size knobs + +Default targets ~30-60 MB for an 8-minute video. To tune: + +- `--crf ` in the ffmpeg step: 22 is near-lossless, 26 is default, 30+ is smaller. + +## File organization + +``` +pr-to-hyperframes/ +├── SKILL.md # This file +├── scripts/ # CLI tools (checked in) +│ ├── generate-audio.sh # narration.json → per-slide WAVs + durations.json +│ └── make-video.sh # Static slide + audio assembly fallback +├── video/ # Hyperframes project (checked in) +│ ├── hyperframes.json # hyperframes config +│ ├── meta.json # project meta +│ ├── build.mjs # manifest.json → index.html composition +│ ├── render.sh # manifest.json → 720p MP4 (full pipeline) +│ ├── assets/ # Auto-populated at render time (gitignored) +│ ├── transcripts/ # Whisper word-level JSON (gitignored, cached) +│ └── renders/ # Intermediate 1080p renders (gitignored) +├── out/ # Final outputs (gitignored) +│ └── pr-XXXX-walkthrough.mp4 +└── tmp/ # Intermediate files (gitignored) + └── pr-XXXX/ + ├── SCRIPT.md # Narration script + ├── narration.json # Input to generate-audio.sh + ├── durations.json # Audio durations + ├── manifest.json # Input to render.sh + └── audio-XX.wav # Per-segment audio clips +``` + +## API configuration + +- **Gemini API key:** Stored as `GEMINI_API_KEY` in the project root `.env`. +- **TTS model:** `gemini-2.5-pro-tts` +- **TTS voice:** `Iapetus` (default) + +## Script structure + +The walkthrough follows a consistent narrative arc. 8-12 segments total, with the vast majority showing code. + +### Intro (1 segment) + +The intro card: project logo/name + PR title + date. The narration should be a single sentence framing what the PR does at a high level. + +Manifest slide type: `intro`. + +### Context (0-1 segments) + +Brief orientation before diving into code. What was the situation before this PR? What problem motivated the work? + +- Be concrete: "Arrow bindings broke when the target was inside a group" not "There were issues with bindings" +- Name the area of the codebase affected + +If context can be explained while showing the first piece of code, skip the standalone context segment. + +Manifest slide type: `text` or `diff`. + +### Code walkthrough (6-10 segments) + +The bulk of the video. Walk through actual code changes, showing diffs and files while explaining what was done and why. + +**Every segment should show code.** Use `diff` slides for changes and `code` slides for unchanged reference code. + +- **Name files and functions.** Every segment should reference at least one specific file or function. +- **Show the diff.** Use `git diff main..HEAD -- path/to/file` and extract relevant hunks. +- **Order by understanding, not by file.** Present changes in the order that builds comprehension. +- **Explain the "why", not just the "what".** +- **Skip boilerplate, but mention it.** "There are also some type exports added in `index.ts`." +- **Group related small changes.** If three files got the same one-line fix, one segment covers all three. + +### Summary (1 segment) + +Brief recap of what the PR accomplished. A sentence or two summarizing the change, mentioning known limitations or follow-up work. + +Manifest slide type: `text`. + +### Outro (1 segment, silent) + +The project logo/name, a subtle "Made with HyperFrames" line, 3 seconds of silence. + +Manifest slide type: `outro` with `durationInSeconds: 3`. + +## Narration writing tips + +- **Be specific about code.** Say "In `BindingUtil.ts`, the `onAfterChange` handler now checks for group ancestors" — not "The binding system was updated." +- **Each segment = one change or closely related group.** +- **Write as the author.** "So the main thing here is..." or "The tricky part was..." are fine. +- **Avoid redundancy** between intro and first content segment. +- **Mention files that aren't shown.** If a PR touches 15 files but only 6 are interesting, briefly acknowledge the others. +- **Duration estimation:** professional narration pace is ~2.5 words/second. Count the words in each segment's narration text and divide by 2.5 to get `durationInSeconds`. Add 1-2 seconds for visual-only moments (intro reveal, diff highlight pause). A 50-word segment ≈ 22 seconds. +- Aim for **5-7 minutes** total narration for large PRs, **1-3 minutes** for small fixes. + +## Checklist + +- [ ] Resolve repo branding (name, colors, fonts, logo) +- [ ] Read all PR commits and understand the full diff +- [ ] Write narration in SCRIPT.md (8-12 segments) +- [ ] Generate per-segment audio (Iapetus voice) +- [ ] Read durations.json to get per-segment durations +- [ ] Write manifest.json with slide types, diffs/code, audio refs, and branding +- [ ] Render video with render.sh +- [ ] Verify final output: 1280x720 / 30 fps, audio synced, captions readable, outro present +- [ ] Embed video in PR body with HyperFrames attribution diff --git a/skills/pr-to-hyperframes/scripts/generate-audio.sh b/skills/pr-to-hyperframes/scripts/generate-audio.sh new file mode 100755 index 000000000..1f479045f --- /dev/null +++ b/skills/pr-to-hyperframes/scripts/generate-audio.sh @@ -0,0 +1,263 @@ +#!/bin/bash +# generate-audio.sh — Generate walkthrough narration audio from a JSON script. +# +# Generates one TTS call per segment, producing individual WAV clips directly. +# No chunking, alignment, or splitting needed. +# +# Usage: +# ./generate-audio.sh [output-dir] +# +# Input JSON format: +# { +# "style": "Read in a calm, steady, professional tone...", +# "voice": "Iapetus", (optional, default: Iapetus) +# "slides": [ +# "Intro narration text...", +# "Problem slide narration...", +# "Approach narration...", +# ... +# ] +# } +# +# Output: +# /audio-00.wav, audio-01.wav, ... +# /durations.json +# +# Dependencies: +# ffmpeg / ffprobe +# +# Environment: +# GEMINI_API_KEY — required. Auto-sourced from .env if not set. +# +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" + +# --- Args --- +SCRIPT_JSON="${1:?Usage: generate-audio.sh [output-dir]}" +OUTPUT_DIR="${2:-.}" + +# Resolve relative paths +[[ "$SCRIPT_JSON" != /* ]] && SCRIPT_JSON="$(pwd)/$SCRIPT_JSON" +[[ "$OUTPUT_DIR" != /* ]] && OUTPUT_DIR="$(pwd)/$OUTPUT_DIR" + +if [ ! -f "$SCRIPT_JSON" ]; then + echo "Error: ${SCRIPT_JSON} not found" + exit 1 +fi + +mkdir -p "$OUTPUT_DIR" + +PYTHON="python3" + +# --- API key --- +REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || echo ".") + +if [ -z "${GEMINI_API_KEY:-}" ]; then + if [ -f "${REPO_ROOT}/.env" ]; then + export $(grep '^GEMINI_API_KEY=' "${REPO_ROOT}/.env" | xargs) 2>/dev/null || true + fi +fi +GEMINI_API_KEY="${GEMINI_API_KEY:?Set GEMINI_API_KEY environment variable or add it to .env}" + +# --- Config --- +TTS_MODEL="gemini-2.5-pro-preview-tts" +TTS_ENDPOINT="https://generativelanguage.googleapis.com/v1beta/models/${TTS_MODEL}:generateContent" +SPEED=1.2 + +# --- Run everything in Python for reliability --- +"$PYTHON" - "$SCRIPT_JSON" "$OUTPUT_DIR" "$GEMINI_API_KEY" "$TTS_MODEL" "$TTS_ENDPOINT" "$SPEED" <<'PYTHON_SCRIPT' +import json, sys, os, subprocess, base64, urllib.request, re + +script_json = sys.argv[1] +output_dir = sys.argv[2] +api_key = sys.argv[3] +tts_model = sys.argv[4] +tts_endpoint = sys.argv[5] +speed = float(sys.argv[6]) + +MAX_RETRIES = 2 + +def api_call(endpoint, body_dict): + body = json.dumps(body_dict).encode() + req = urllib.request.Request( + f"{endpoint}?key={api_key}", + data=body, + headers={"Content-Type": "application/json"}, + method="POST", + ) + with urllib.request.urlopen(req) as resp: + return json.loads(resp.read()) + +# --- Load narration --- +with open(script_json) as f: + data = json.load(f) + +voice = data.get("voice", "Iapetus") +slides = data["slides"] +style = data.get("style", + "Read the following in a calm, steady, professional tone. " + "Speak at a measured pace.") + +word_count = sum(len(s.split()) for s in slides) +print(f"=== Generating narration audio ===") +print(f" Voice: {voice}") +print(f" Slides: {len(slides)}") +print(f" Words: {word_count}") +print() + +def call_tts(prompt_text): + response = api_call(tts_endpoint, { + "contents": [{"parts": [{"text": prompt_text}]}], + "generationConfig": { + "responseModalities": ["AUDIO"], + "speechConfig": { + "voiceConfig": { + "prebuiltVoiceConfig": { + "voiceName": voice + } + } + } + } + }) + + error_msg = response.get("error", {}).get("message", "") + if error_msg: + raise RuntimeError(f"TTS API error: {error_msg}") + + return base64.b64decode(response["candidates"][0]["content"]["parts"][0]["inlineData"]["data"]) + +def pcm_to_wav(pcm_bytes, out_wav): + pcm_tmp = out_wav + ".pcm" + with open(pcm_tmp, "wb") as f: + f.write(pcm_bytes) + subprocess.run([ + "ffmpeg", "-y", "-f", "s16le", "-ar", "24000", "-ac", "1", + "-i", pcm_tmp, "-af", f"atempo={speed}", "-ar", "48000", out_wav + ], capture_output=True, check=True) + os.remove(pcm_tmp) + +def get_duration(wav_path): + result = subprocess.run( + ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "csv=p=0", wav_path], + capture_output=True, text=True + ) + return float(result.stdout.strip()) + +def validate_duration(wav_path, word_count): + dur = get_duration(wav_path) + expected = word_count / 150 * 60 / speed + lower = expected * 0.3 + upper = expected * 3.0 + if word_count < 15: + return dur < 30, dur + return lower <= dur <= upper, dur + +# --- Generate one TTS call per segment --- +durations = {} + +for i, text in enumerate(slides): + num = f"{i:02d}" + out_path = os.path.join(output_dir, f"audio-{num}.wav") + wc = len(text.split()) + prompt = f"{style}\n\n{text}" + + ok = False + for attempt in range(MAX_RETRIES + 1): + try: + label = f" [{num}] " + ("" if attempt == 0 else f"(retry {attempt}) ") + print(f"{label}Generating ({wc} words)...", end=" ", flush=True) + pcm_data = call_tts(prompt) + pcm_to_wav(pcm_data, out_path) + ok, dur = validate_duration(out_path, wc) + if ok: + print(f"{dur:.1f}s") + durations[f"audio-{num}.wav"] = round(dur, 2) + break + else: + expected = wc / 150 * 60 / speed + print(f"{dur:.1f}s (expected ~{expected:.0f}s, retrying)") + except (urllib.error.HTTPError, RuntimeError) as e: + print(f"error: {e}") + if attempt == MAX_RETRIES: + print(f" [error] Segment {i} failed after {MAX_RETRIES + 1} attempts") + sys.exit(1) + + if not ok: + dur = get_duration(out_path) + durations[f"audio-{num}.wav"] = round(dur, 2) + print(f" [warn] Segment {i} audio may be unreliable ({dur:.1f}s for {wc} words)") + +# --- Trim silence from each clip --- +MAX_SILENCE = 0.15 +SILENCE_THRESHOLD = "-40dB" +print() +print("=== Trimming silence ===") + +for i in range(len(slides)): + num = f"{i:02d}" + clip_path = os.path.join(output_dir, f"audio-{num}.wav") + + detect = subprocess.run([ + "ffmpeg", "-i", clip_path, "-af", + f"silencedetect=noise={SILENCE_THRESHOLD}:d=0.1", + "-f", "null", "-" + ], capture_output=True, text=True) + stderr = detect.stderr + + clip_dur = get_duration(clip_path) + + silence_starts = re.findall(r'silence_start: ([\d.]+)', stderr) + silence_ends = re.findall(r'silence_end: ([\d.]+)', stderr) + + trim_start = 0.0 + if silence_starts and float(silence_starts[0]) < 0.05: + if silence_ends: + leading_silence = float(silence_ends[0]) + if leading_silence > MAX_SILENCE: + trim_start = leading_silence - MAX_SILENCE + + trim_end = clip_dur + is_last = (i == len(slides) - 1) + if not is_last and silence_starts: + last_silence_start = float(silence_starts[-1]) + last_silence_is_trailing = True + for se in silence_ends: + se_val = float(se) + if se_val > last_silence_start and se_val < clip_dur - 0.05: + last_silence_is_trailing = False + break + if last_silence_is_trailing and last_silence_start > 0.05: + trailing_silence = clip_dur - last_silence_start + if trailing_silence > MAX_SILENCE: + trim_end = last_silence_start + MAX_SILENCE + + if trim_start > 0 or trim_end < clip_dur: + trimmed_path = clip_path + ".tmp.wav" + subprocess.run([ + "ffmpeg", "-y", "-i", clip_path, + "-ss", str(trim_start), "-to", str(trim_end), + "-c", "copy", trimmed_path + ], capture_output=True) + os.replace(trimmed_path, clip_path) + new_dur = trim_end - trim_start + durations[f"audio-{num}.wav"] = round(new_dur, 2) + print(f" audio-{num}.wav: {clip_dur:.1f}s -> {new_dur:.1f}s (trimmed {clip_dur - new_dur:.1f}s)") + else: + print(f" audio-{num}.wav: {clip_dur:.1f}s (no trim needed)") + +# --- Write durations.json --- +durations_path = os.path.join(output_dir, "durations.json") +with open(durations_path, "w") as f: + json.dump(durations, f, indent=2) + +total_dur = sum(durations.values()) +print(f"\n Wrote durations.json ({len(durations)} entries, {total_dur:.1f}s total)") + +print() +print("=== Done ===") +PYTHON_SCRIPT + +echo "" +echo "Output:" +ls -la "${OUTPUT_DIR}"/audio-*.wav 2>/dev/null || echo " (no files generated)" diff --git a/skills/pr-to-hyperframes/scripts/make-video.sh b/skills/pr-to-hyperframes/scripts/make-video.sh new file mode 100755 index 000000000..12b4a3bd9 --- /dev/null +++ b/skills/pr-to-hyperframes/scripts/make-video.sh @@ -0,0 +1,83 @@ +#!/bin/bash +# make-video.sh — Assemble walkthrough slides + audio into a final MP4. +# Fallback for when hyperframes render is not available. +# +# Usage: +# ./make-video.sh [outro-duration] +# +# Expects in : +# slide-00.png, slide-01.png, ... (one per segment, including outro) +# audio-00.wav, audio-01.wav, ... (one per narrated segment) +# +# The last slide PNG without a matching audio WAV is the silent outro. +# +set -euo pipefail + +SLIDE_DIR="${1:?Usage: make-video.sh [outro-duration]}" +OUTPUT="${2:?Usage: make-video.sh [outro-duration]}" +OUTRO_DUR="${3:-3}" + +# Resolve relative paths +[[ "$SLIDE_DIR" != /* ]] && SLIDE_DIR="$(pwd)/$SLIDE_DIR" +[[ "$OUTPUT" != /* ]] && OUTPUT="$(pwd)/$OUTPUT" + +mkdir -p "$(dirname "$OUTPUT")" + +TMPDIR_WORK=$(mktemp -d) +trap "rm -rf $TMPDIR_WORK" EXIT + +echo "=== Assembling video ===" +echo " Slides: $SLIDE_DIR" +echo " Output: $OUTPUT" + +SLIDE_COUNT=$(ls "$SLIDE_DIR"/slide-*.png 2>/dev/null | wc -l | tr -d ' ') +AUDIO_COUNT=$(ls "$SLIDE_DIR"/audio-*.wav 2>/dev/null | wc -l | tr -d ' ') + +echo " Found $SLIDE_COUNT slides, $AUDIO_COUNT audio clips" +echo " Last slide (no audio) = outro (${OUTRO_DUR}s)" + +CONCAT_LIST="$TMPDIR_WORK/concat.txt" +> "$CONCAT_LIST" + +for i in $(seq 0 $((SLIDE_COUNT - 1))); do + NUM=$(printf "%02d" $i) + SLIDE="$SLIDE_DIR/slide-${NUM}.png" + AUDIO="$SLIDE_DIR/audio-${NUM}.wav" + SEGMENT="$TMPDIR_WORK/segment-${NUM}.mp4" + + if [ -f "$AUDIO" ]; then + DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$AUDIO") + echo " segment-${NUM}: slide + audio (${DUR}s)" + + ffmpeg -y -loop 1 -i "$SLIDE" -i "$AUDIO" \ + -c:v libx264 -tune stillimage -pix_fmt yuv420p \ + -vf "scale=1600:900:force_original_aspect_ratio=decrease,pad=1600:900:(ow-iw)/2:(oh-ih)/2" \ + -c:a aac -b:a 192k -ar 48000 \ + -shortest -movflags +faststart \ + "$SEGMENT" 2>/dev/null + else + echo " segment-${NUM}: silent outro (${OUTRO_DUR}s)" + ffmpeg -y -loop 1 -i "$SLIDE" -f lavfi -i anullsrc=r=48000:cl=mono \ + -c:v libx264 -tune stillimage -pix_fmt yuv420p \ + -vf "scale=1600:900:force_original_aspect_ratio=decrease,pad=1600:900:(ow-iw)/2:(oh-ih)/2" \ + -c:a aac -b:a 192k -ar 48000 \ + -t "$OUTRO_DUR" -movflags +faststart \ + "$SEGMENT" 2>/dev/null + fi + + echo "file '$SEGMENT'" >> "$CONCAT_LIST" +done + +echo "" +echo " Concatenating ${SLIDE_COUNT} segments..." +ffmpeg -y -f concat -safe 0 -i "$CONCAT_LIST" \ + -c copy -movflags +faststart \ + "$OUTPUT" 2>/dev/null + +FINAL_DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$OUTPUT") +FINAL_SIZE=$(ls -lh "$OUTPUT" | awk '{print $5}') +echo "" +echo "=== Done ===" +echo " Output: $OUTPUT" +echo " Duration: ${FINAL_DUR}s" +echo " Size: $FINAL_SIZE" diff --git a/skills/pr-to-hyperframes/video/.gitignore b/skills/pr-to-hyperframes/video/.gitignore new file mode 100644 index 000000000..4f4dd1832 --- /dev/null +++ b/skills/pr-to-hyperframes/video/.gitignore @@ -0,0 +1,6 @@ +# Generated by render.sh +index.html +assets/ +transcripts/ +renders/ +node_modules/ diff --git a/skills/pr-to-hyperframes/video/build.mjs b/skills/pr-to-hyperframes/video/build.mjs new file mode 100644 index 000000000..c8f07aa69 --- /dev/null +++ b/skills/pr-to-hyperframes/video/build.mjs @@ -0,0 +1,838 @@ +// build.mjs — Generate index.html for a PR walkthrough video from a manifest +// JSON file. Reads slide definitions and branding config, then emits one HTML +// composition with timed clips + a single GSAP timeline driving slide +// transitions, code-focus pans, and captions sourced from whisper word-level +// transcripts of each audio file. +// +// Usage: +// node build.mjs +// +// Expects: +// - audio-NN.wav files alongside the manifest (referenced by slide.audio) +// - copies of those files in ./assets/audio-NN.wav (done by render.sh) +// - whisper transcripts in ./transcripts/audio-NN.json (done by render.sh) +// - manifest.branding for project-specific colors, fonts, and name +// +// Output: ./index.html (the hyperframes composition) + +import fs from "node:fs"; +import path from "node:path"; +import url from "node:url"; + +const __dirname = path.dirname(url.fileURLToPath(import.meta.url)); + +// --- Args -------------------------------------------------------------------- + +const manifestPath = process.argv[2]; +if (!manifestPath) { + console.error("Usage: node build.mjs "); + process.exit(1); +} +const manifestAbs = path.resolve(manifestPath); +if (!fs.existsSync(manifestAbs)) { + console.error(`Manifest not found: ${manifestAbs}`); + process.exit(1); +} + +const manifest = JSON.parse(fs.readFileSync(manifestAbs, "utf8")); + +// --- Branding ---------------------------------------------------------------- + +const DEFAULT_BRANDING = { + name: "Project", + org: "", + repo: "", + logo: null, + colors: { + text: "#09090b", + background: "#ffffff", + accent: "#3b82f6", + caption: "#ffd800", + captionBg: "#09090b", + }, + fonts: { + body: "Geist", + mono: "Geist Mono", + }, +}; + +const brand = { ...DEFAULT_BRANDING, ...manifest.branding }; +brand.colors = { ...DEFAULT_BRANDING.colors, ...(manifest.branding?.colors || {}) }; +brand.fonts = { ...DEFAULT_BRANDING.fonts, ...(manifest.branding?.fonts || {}) }; + +const repoSlug = brand.repo || `${brand.org}/${brand.name}`; + +// --- Whisper transcripts ----------------------------------------------------- + +const TRANSCRIPTS_DIR = path.join(__dirname, "transcripts"); +const transcripts = new Map(); +if (fs.existsSync(TRANSCRIPTS_DIR)) { + for (const f of fs.readdirSync(TRANSCRIPTS_DIR)) { + if (!f.endsWith(".json")) continue; + const audioName = f.replace(/\.json$/, ".wav"); + transcripts.set(audioName, JSON.parse(fs.readFileSync(path.join(TRANSCRIPTS_DIR, f), "utf8"))); + } +} + +function chunkTranscript(words, { maxWords = 7, gapThreshold = 0.45 } = {}) { + const chunks = []; + let current = []; + for (const w of words) { + if (current.length === 0) { + current.push(w); + continue; + } + const prev = current[current.length - 1]; + const gap = w.start - prev.end; + if (gap > gapThreshold || current.length >= maxWords) { + chunks.push(current); + current = [w]; + } else { + current.push(w); + } + } + if (current.length) chunks.push(current); + return chunks.map((group) => ({ + text: group.map((g) => g.text).join(" "), + start: group[0].start, + end: group[group.length - 1].end, + })); +} + +function makeCaptions(audioFile, audioStart) { + const words = transcripts.get(audioFile); + if (!words) return []; + const chunks = chunkTranscript(words); + return chunks.map((c) => ({ + text: c.text, + start: audioStart + c.start, + duration: Math.max(0.4, c.end - c.start), + })); +} + +// --- Cumulative timing ------------------------------------------------------- + +let cursor = 0; +const timed = manifest.slides.map((slide, i) => { + const start = cursor; + const duration = slide.durationInSeconds; + cursor += duration; + return { slide, start, duration, i }; +}); +const totalDuration = cursor; + +// --- HTML escape ------------------------------------------------------------- + +function esc(s) { + return String(s).replace(/&/g, "&").replace(//g, ">"); +} + +// --- Light syntax highlighting ----------------------------------------------- + +const KEYWORDS = new Set([ + "abstract", + "as", + "async", + "await", + "boolean", + "break", + "case", + "catch", + "class", + "const", + "constructor", + "continue", + "default", + "delete", + "do", + "else", + "enum", + "export", + "extends", + "false", + "finally", + "for", + "from", + "function", + "get", + "if", + "implements", + "import", + "in", + "instanceof", + "interface", + "is", + "let", + "new", + "null", + "number", + "of", + "override", + "private", + "protected", + "public", + "readonly", + "return", + "set", + "static", + "string", + "super", + "switch", + "this", + "throw", + "true", + "try", + "type", + "typeof", + "undefined", + "void", + "while", + "yield", + "any", + "never", + "unknown", +]); + +function highlightLine(line) { + const re = + /(\/\/.*$)|(\/\*[\s\S]*?\*\/)|('(?:\\.|[^'\\])*')|("(?:\\.|[^"\\])*")|(`(?:\\.|[^`\\])*`)|(\b\d+(?:\.\d+)?\b)|(\b[A-Za-z_$][\w$]*\b)|(@\w+)/g; + let out = ""; + let last = 0; + for (const m of line.matchAll(re)) { + out += esc(line.slice(last, m.index)); + const [tok, comment, block, sq, dq, bt, num, ident, decorator] = m; + if (comment || block) out += `${esc(tok)}`; + else if (sq || dq || bt) out += `${esc(tok)}`; + else if (num) out += `${esc(tok)}`; + else if (decorator) out += `${esc(tok)}`; + else if (ident) { + if (KEYWORDS.has(ident)) out += `${esc(ident)}`; + else if (/^[A-Z]/.test(ident)) out += `${esc(ident)}`; + else out += esc(ident); + } + last = m.index + tok.length; + } + out += esc(line.slice(last)); + return out || " "; +} + +// --- Logo -------------------------------------------------------------------- + +function renderLogo() { + if (brand.logo) { + const ext = path.extname(brand.logo).toLowerCase(); + if (ext === ".svg") { + const svgPath = path.resolve(brand.logo); + if (fs.existsSync(svgPath)) { + return fs.readFileSync(svgPath, "utf8"); + } + } + return ``; + } + return `${esc(brand.name)}`; +} + +// --- Slide renderers --------------------------------------------------------- + +function slideAttrs({ start, duration, i }, extra = "") { + const initialStyle = i === 0 ? ` style="opacity: 1"` : ""; + return `class="clip slide" data-start="${start}" data-duration="${duration}" data-track-index="2" id="slide-${i}"${initialStyle} ${extra}`; +} + +function renderIntro({ slide, start, duration, i }) { + const title = slide.title || `PR #${manifest.pr}`; + const cleanTitle = title.replace(/\s*#\d+\s*$/, "").trim(); + const words = cleanTitle.split(/\s+/); + const highlightIndex = Math.max(0, words.length - 2); + const highlighted = words + .map((w, n) => (n === highlightIndex ? `${esc(w)}` : esc(w))) + .join(" "); + + return ` +
+
+
+
+ Pull Request + ${esc(repoSlug)} · #${manifest.pr} +
+

${highlighted}

+ ${slide.subtitle ? `

${esc(slide.subtitle)}

` : ""} +
+ ${slide.date ? `${esc(slide.date)}` : ""} + Walkthrough +
+
+
`; +} + +function renderSegment({ slide, start, duration, i }) { + return ` +
+
+
+
+

${esc(slide.title || "")}

+
+
+
`; +} + +function renderCode({ slide, start, duration, i }) { + const lines = (slide.code || "").split("\n"); + const focus = slide.focus || [{ line: 0, at: 0 }]; + const codeLines = lines + .map( + (l, n) => + `
${String(n + 1).padStart(2, " ")}${highlightLine(l)}
`, + ) + .join(""); + const focusJson = JSON.stringify(focus); + return ` +
+
+
+
+ ${esc(slide.language || "ts")} + ${esc(slide.filename || "")} + ${esc(slide.title || "")} +
+
+
+ ${codeLines} +
+
+
+
+
+
`; +} + +function renderDiffLines(diff) { + const lines = diff.split("\n"); + return lines + .map((l) => { + let cls = "dl"; + let mark = ""; + if (l.startsWith("@@")) { + cls += " dl-hunk"; + mark = "⋯"; + } else if (l.startsWith("+++") || l.startsWith("---")) { + cls += " dl-meta"; + } else if (l.startsWith("+")) { + cls += " dl-add"; + mark = "+"; + } else if (l.startsWith("-")) { + cls += " dl-del"; + mark = "−"; + } else { + mark = " "; + } + const body = l.startsWith("+") || l.startsWith("-") ? l.slice(1) : l; + return `
${esc(mark)}${highlightLine(body)}
`; + }) + .join(""); +} + +function renderDiff({ slide, start, duration, i }) { + return ` +
+
+
+
+ ${esc(slide.language || "ts")} + ${esc(slide.filename || "")} + ${esc(slide.title || "")} +
+
+
+ ${renderDiffLines(slide.diff || "")} +
+
+
+
+
+
`; +} + +function renderText({ slide, start, duration, i }) { + return ` +
+
+
+
+ Summary + ${esc(repoSlug)} · #${manifest.pr} +
+

${esc(slide.title || "")}

+ ${slide.subtitle ? `

${esc(slide.subtitle)}

` : ""} +
+
`; +} + +function renderList({ slide, start, duration, i }) { + const items = (slide.items || []) + .map( + (it, n) => + `
  • ${n + 1}.${esc(it)}
  • `, + ) + .join(""); + return ` +
    +
    +
    +

    ${esc(slide.title || "")}

    +
      ${items}
    +
    +
    `; +} + +function renderImage({ slide, start, duration, i }) { + return ` +
    +
    +
    + +
    +
    `; +} + +function renderOutro({ start, duration, i }) { + return ` +
    +
    +
    +
    ${renderLogo()}
    +
    PR Walkthrough · #${manifest.pr}
    +
    Made with HyperFrames
    +
    +
    `; +} + +const RENDERERS = { + intro: renderIntro, + segment: renderSegment, + code: renderCode, + diff: renderDiff, + text: renderText, + list: renderList, + image: renderImage, + outro: renderOutro, +}; + +const slidesHtml = timed + .map((t) => { + const r = RENDERERS[t.slide.type]; + if (!r) throw new Error(`Unknown slide type: ${t.slide.type}`); + return r(t); + }) + .join(""); + +// --- Audio elements ---------------------------------------------------------- + +const audioHtml = timed + .filter(({ slide }) => slide.audio) + .map( + ({ slide, start, i }) => + ``, + ) + .join("\n"); + +// --- Captions --------------------------------------------------------------- + +const allCaptions = []; +for (const { slide, start } of timed) { + if (!slide.audio) continue; + const caps = makeCaptions(slide.audio, start); + allCaptions.push(...caps); +} + +const CAPTION_GAP = 0.002; +const captionsHtml = allCaptions + .map((c, k) => { + const dur = Math.max(0.05, c.duration - CAPTION_GAP); + return `
    ${esc(c.text)}
    `; + }) + .join("\n"); + +// --- Timeline JS ------------------------------------------------------------- + +const timelineJs = []; + +for (const { slide, start, duration, i } of timed) { + const fadeIn = 0.4; + const fadeOut = 0.4; + if (i === 0) { + timelineJs.push(`tl.set("#slide-${i}", { opacity: 1 }, ${start});`); + } else { + timelineJs.push( + `tl.fromTo("#slide-${i}", { opacity: 0 }, { opacity: 1, duration: ${fadeIn}, ease: "power2.out" }, ${start});`, + ); + } + timelineJs.push( + `tl.to("#slide-${i}", { opacity: 0, duration: ${fadeOut}, ease: "power2.in" }, ${start + duration - fadeOut});`, + ); + timelineJs.push(`tl.set("#slide-${i}", { opacity: 0 }, ${start + duration});`); + + if ((slide.type === "code" || slide.type === "diff") && slide.focus && slide.focus.length) { + const lineHeight = 36; + const focus = slide.focus; + const targets = focus.map((f) => ({ + t: start + (f.at || 0) * duration, + y: -Math.max(0, f.line - 4) * lineHeight, + })); + timelineJs.push(`tl.set("#code-scroll-${i}", { y: ${targets[0].y} }, ${start});`); + for (let k = 1; k < targets.length; k++) { + const prev = targets[k - 1]; + const cur = targets[k]; + const dur = Math.max(0.5, cur.t - prev.t); + timelineJs.push( + `tl.to("#code-scroll-${i}", { y: ${cur.y}, duration: ${dur}, ease: "power1.inOut" }, ${prev.t});`, + ); + } + } +} + +// --- Font imports ------------------------------------------------------------ + +const fontFamilies = [brand.fonts.body, brand.fonts.mono].filter(Boolean); +const fontImport = fontFamilies + .map((f) => { + const encoded = f.replace(/\s+/g, "+"); + return `${encoded}:wght@400;500;600;700`; + }) + .join("&family="); + +// --- Final HTML -------------------------------------------------------------- + +const html = ` + + + + + + + + + + + +
    + +${slidesHtml} + +
    +${captionsHtml} +
    + + + +
    + + + + +
    + +${audioHtml} + +
    + + + + +`; + +fs.writeFileSync(path.join(__dirname, "index.html"), html); + +console.log(`Wrote ${path.relative(process.cwd(), path.join(__dirname, "index.html"))}`); +console.log(` ${timed.length} slides, ${totalDuration.toFixed(2)}s total`); +console.log( + ` ${timed.filter((t) => t.slide.audio).length} audio tracks, ${allCaptions.length} captions`, +); +console.log(` Branding: ${brand.name} (${repoSlug})`); diff --git a/skills/pr-to-hyperframes/video/hyperframes.json b/skills/pr-to-hyperframes/video/hyperframes.json new file mode 100644 index 000000000..5fb1d6d87 --- /dev/null +++ b/skills/pr-to-hyperframes/video/hyperframes.json @@ -0,0 +1,9 @@ +{ + "$schema": "https://hyperframes.heygen.com/schema/hyperframes.json", + "registry": "https://raw.githubusercontent.com/heygen-com/hyperframes/main/registry", + "paths": { + "blocks": "compositions", + "components": "compositions/components", + "assets": "assets" + } +} diff --git a/skills/pr-to-hyperframes/video/meta.json b/skills/pr-to-hyperframes/video/meta.json new file mode 100644 index 000000000..e59f77d99 --- /dev/null +++ b/skills/pr-to-hyperframes/video/meta.json @@ -0,0 +1,4 @@ +{ + "id": "pr-walkthrough", + "name": "pr-walkthrough" +} diff --git a/skills/pr-to-hyperframes/video/render.sh b/skills/pr-to-hyperframes/video/render.sh new file mode 100755 index 000000000..ec5482912 --- /dev/null +++ b/skills/pr-to-hyperframes/video/render.sh @@ -0,0 +1,155 @@ +#!/bin/bash +# render.sh — Render a pr-walkthrough video from a manifest JSON file using +# hyperframes. The pipeline: +# 1. Copy referenced audio/image files into ./assets/ +# 2. Run whisper transcription on each audio file → ./transcripts/ +# 3. Run build.mjs to generate index.html +# 4. Lint + render via npx hyperframes (1080p/30fps) +# 5. Downscale + recompress to 1280×720 with ffmpeg → final MP4 +# +# Usage: +# ./render.sh +# +set -euo pipefail + +MANIFEST="${1:?Usage: render.sh }" +OUTPUT="${2:?Usage: render.sh }" + +# Resolve relative paths +[[ "$MANIFEST" != /* ]] && MANIFEST="$(pwd)/$MANIFEST" +[[ "$OUTPUT" != /* ]] && OUTPUT="$(pwd)/$OUTPUT" + +if [ ! -f "$MANIFEST" ]; then + echo "Error: Manifest not found: $MANIFEST" >&2 + exit 1 +fi + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +ASSETS_DIR="$SCRIPT_DIR/assets" +TRANSCRIPTS_DIR="$SCRIPT_DIR/transcripts" +RENDERS_DIR="$SCRIPT_DIR/renders" +MANIFEST_DIR="$(dirname "$MANIFEST")" + +mkdir -p "$ASSETS_DIR" "$TRANSCRIPTS_DIR" "$RENDERS_DIR" +mkdir -p "$(dirname "$OUTPUT")" + +echo "=== Rendering walkthrough video ===" +echo " Manifest: $MANIFEST" +echo " Output: $OUTPUT" +echo "" + +# --- 1. Extract referenced audio/image filenames from the manifest --------- +REFERENCED_FILES=$(python3 -c " +import json, sys +with open(sys.argv[1]) as f: + m = json.load(f) +files = set() +for s in m['slides']: + if 'audio' in s and s['audio']: files.add(s['audio']) + if 'src' in s and s['src']: files.add(s['src']) +# Copy logo if specified in branding +b = m.get('branding', {}) +if b.get('logo'): + files.add(b['logo']) +print('\n'.join(sorted(files))) +" "$MANIFEST") + +# --- 2. Copy referenced files into ./assets/ ------------------------------- +echo " [1/5] Copying assets..." +rm -rf "$ASSETS_DIR" +mkdir -p "$ASSETS_DIR" +for FILE in $REFERENCED_FILES; do + SRC="$MANIFEST_DIR/$FILE" + if [ -f "$SRC" ]; then + cp "$SRC" "$ASSETS_DIR/$FILE" + else + echo " Warning: Referenced file not found: $SRC" >&2 + fi +done + +# --- 3. Whisper transcribe each audio file (idempotent per file) ---------- +echo " [2/5] Transcribing audio (whisper)..." +for WAV in "$ASSETS_DIR"/*.wav; do + [ -f "$WAV" ] || continue + BASE=$(basename "$WAV" .wav) + OUT_JSON="$TRANSCRIPTS_DIR/$BASE.json" + if [ -f "$OUT_JSON" ] && [ "$OUT_JSON" -nt "$WAV" ]; then + continue + fi + echo " transcribing $BASE..." + (cd "$SCRIPT_DIR" && npx --yes hyperframes transcribe "assets/$BASE.wav" --json >/dev/null) + if [ -f "$SCRIPT_DIR/transcript.json" ]; then + mv "$SCRIPT_DIR/transcript.json" "$OUT_JSON" + fi +done + +# --- 4. Generate index.html ------------------------------------------------ +echo " [3/5] Building composition..." +(cd "$SCRIPT_DIR" && node build.mjs "$MANIFEST") + +# --- 5. Lint (warn but don't fail) ------------------------------------------ +(cd "$SCRIPT_DIR" && npx --yes hyperframes lint) || { + echo " Warning: lint reported issues (continuing)" >&2 +} + +# --- 6. Render at 1080p/30fps with hyperframes ----------------------------- +echo " [4/5] Rendering 1080p frames..." +RENDER_NAME="walkthrough-$$" +TEMP_RENDER="$RENDERS_DIR/$RENDER_NAME.mp4" +rm -f "$TEMP_RENDER" +(cd "$SCRIPT_DIR" && npx --yes hyperframes render \ + -q draft --crf 30 \ + -o "renders/$RENDER_NAME.mp4") >/dev/null 2>&1 & +RENDER_PID=$! + +PREV_SIZE=-1 +STABLE=0 +while kill -0 "$RENDER_PID" 2>/dev/null; do + if [ -f "$TEMP_RENDER" ]; then + SIZE=$(stat -f '%z' "$TEMP_RENDER" 2>/dev/null || stat -c '%s' "$TEMP_RENDER" 2>/dev/null || echo 0) + if [ "$SIZE" -gt 0 ] && [ "$SIZE" -eq "$PREV_SIZE" ]; then + STABLE=$((STABLE + 1)) + if [ "$STABLE" -ge 3 ]; then break; fi + else + STABLE=0 + fi + PREV_SIZE=$SIZE + fi + sleep 2 +done + +pkill -P "$RENDER_PID" 2>/dev/null || true +kill "$RENDER_PID" 2>/dev/null || true +wait "$RENDER_PID" 2>/dev/null || true + +if [ ! -f "$TEMP_RENDER" ]; then + echo "Error: hyperframes did not produce $TEMP_RENDER" >&2 + exit 1 +fi + +# --- 7. Downscale 1080p → 720p, recompress for smaller file --------------- +echo " [5/5] Downscaling to 720p / 30fps..." +ffmpeg -y -i "$TEMP_RENDER" \ + -vf "scale=1280:720:flags=lanczos,fps=30" \ + -c:v libx264 -preset slow -crf 26 -pix_fmt yuv420p \ + -c:a aac -b:a 96k -ar 48000 \ + -movflags +faststart \ + "$OUTPUT" 2>/dev/null + +rm -f "$TEMP_RENDER" + +# --- Report ------------------------------------------------------------------ +if [ -f "$OUTPUT" ]; then + FINAL_DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$OUTPUT" 2>/dev/null || echo "?") + FINAL_SIZE=$(ls -lh "$OUTPUT" | awk '{print $5}') + echo "" + echo "=== Done ===" + echo " Output: $OUTPUT" + echo " Resolution: 1280×720" + echo " FPS: 30" + echo " Duration: ${FINAL_DUR}s" + echo " Size: $FINAL_SIZE" +else + echo "Error: render failed — output file not created" >&2 + exit 1 +fi