From 7e7c950fc1aea21782f5663637b6f1dce7bc36be Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 30 Jun 2026 14:21:14 -0700 Subject: [PATCH 1/2] feat: add `demo-creator` skill Add `.claude/skills/demo-creator/`, a skill that turns a one-line problem statement into a REAL recorded terminal demo plus a blog showing how to do it with `altimate-code` and why that's worth it. Highlights: - Two modes: `showcase` (default, `altimate-code` alone) and optional `comparison` (vs `claude-code`); a baseline is not required. - Two integrity laws: recordings reconcile with the session trace/`--format json` (no faking); value is shown, not asserted. - Real-run recording via `asciinema` + `agg` (`record.sh`), with `vhs` as a scripted fallback; frame extraction + authenticity cross-check (`inspect.sh`). - `extract_learnings.sh` mines a session into `learnings.md` (commands run, value-moment evidence, reusable lessons) feeding the blog and a lesson ledger. - `build_artifact.sh` inlines clips as base64 and renders headless so the page can be visually inspected before publishing as a Claude artifact. - Reuses `blog-writer`, `viral-tech-blog`, `humanizer`; `quality-bar.md` gate requires rendering and inspecting the published page, not just the clips. Validated end-to-end by dogfooding it on a PySpark SQL-port demo (artifacts kept out of this PR). Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/skills/demo-creator/SKILL.md | 235 ++++++++++++++++++ .../demo-creator/references/quality-bar.md | 68 +++++ .../demo-creator/references/recording.md | 73 ++++++ .../demo-creator/references/value-wedge.md | 65 +++++ .../scripts/baseline_vs_altimate.sh | 80 ++++++ .../demo-creator/scripts/build_artifact.sh | 67 +++++ .../demo-creator/scripts/extract_learnings.sh | 96 +++++++ .../skills/demo-creator/scripts/inspect.sh | 43 ++++ .claude/skills/demo-creator/scripts/lib.sh | 20 ++ .claude/skills/demo-creator/scripts/record.sh | 56 +++++ .../scripts/setup_pyspark_fixture.sh | 62 +++++ .../demo-creator/templates/blog.md.tmpl | 52 ++++ .../demo-creator/templates/demo.tape.tmpl | 25 ++ .../demo-creator/templates/replicate.md.tmpl | 36 +++ 14 files changed, 978 insertions(+) create mode 100644 .claude/skills/demo-creator/SKILL.md create mode 100644 .claude/skills/demo-creator/references/quality-bar.md create mode 100644 .claude/skills/demo-creator/references/recording.md create mode 100644 .claude/skills/demo-creator/references/value-wedge.md create mode 100755 .claude/skills/demo-creator/scripts/baseline_vs_altimate.sh create mode 100755 .claude/skills/demo-creator/scripts/build_artifact.sh create mode 100755 .claude/skills/demo-creator/scripts/extract_learnings.sh create mode 100755 .claude/skills/demo-creator/scripts/inspect.sh create mode 100755 .claude/skills/demo-creator/scripts/lib.sh create mode 100755 .claude/skills/demo-creator/scripts/record.sh create mode 100755 .claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh create mode 100644 .claude/skills/demo-creator/templates/blog.md.tmpl create mode 100644 .claude/skills/demo-creator/templates/demo.tape.tmpl create mode 100644 .claude/skills/demo-creator/templates/replicate.md.tmpl diff --git a/.claude/skills/demo-creator/SKILL.md b/.claude/skills/demo-creator/SKILL.md new file mode 100644 index 000000000..c4898f3e4 --- /dev/null +++ b/.claude/skills/demo-creator/SKILL.md @@ -0,0 +1,235 @@ +--- +name: demo-creator +description: | + Turn a one-line problem statement into a REAL recorded terminal demo (a series of + small clips) plus a blog post showing HOW to do it with altimate-code and WHY that's + worth it. Use when the user asks to "make a demo", "record a demo", "create a demo + + blog", or "show off capability X" for altimate-code. Pipeline: deep research the real + pain -> critique -> design -> record a genuine `altimate-code run` -> extract what the + session teaches -> visually inspect the frames -> write the blog -> critique loop until + it clears the bar. Runs in showcase mode (altimate-code alone) by default, or comparison + mode (vs claude-code) when a head-to-head sharpens the point. Always ships "replicate it + yourself" steps. +license: MIT +compatibility: claude-code opencode +allowed-tools: + - Read + - Write + - Edit + - Glob + - Grep + - Bash + - WebFetch + - WebSearch + - AskUserQuestion + - Skill + - Agent + - Task +--- + +# Demo Creator: real demos that prove why altimate-code is better + +You produce **demos people believe** and **blogs that earn the click**. A demo is worth +nothing if it's staged, and a blog is worth nothing if it lists features instead of +showing a win. This skill forces both to be real and both to be earned. + +The output is usually **a series of small, tight clips** — one value-moment each — not one +long recording. Small clips are easier to record cleanly, easier to embed, and easier for +a reader to actually watch. + +## The two laws (do not break these) + +1. **Recordings are of real runs.** You record an actual `altimate-code run`, never typed-up + fake output. After recording, verify what the clip shows against the session trace + (`altimate-code trace view `) and/or the `--format json` event stream. If the clip + shows something the run didn't do, the clip is invalid — re-record. +2. **Value is shown, not asserted.** The "why use altimate-code" must come from what the + recording actually shows the tool doing — never from adjectives. Don't narrate a benefit + the clip doesn't demonstrate. If, when you watch the real run, altimate-code didn't + actually do the valuable thing, you have no demo yet: change the task, or report it + honestly. We are selling trust; a staged or overclaimed demo destroys it. + +If you ever feel pressure to fudge either law to "make the demo look good," stop and tell +the user the honest result instead. + +## Two modes — baseline is OPTIONAL + +Pick per angle (you can mix across a series): + +- **Showcase mode (default).** Just show how altimate-code does the task, and let the + *specific thing it does* — runs the SQL and diffs it, runs `check`, traces lineage, + proves equivalence — be the reason to use it. No second agent. This is the right mode + when the capability is self-evidently valuable on its own (e.g. "it verified the output," + "it caught the defect"). The viewer sees the value directly; you don't need a foil. +- **Comparison mode (optional).** Add a **claude-code** (`claude -p`) control on the same + prompt/fixture/model when a head-to-head genuinely sharpens the point — i.e. the value is + "altimate-code does X that a general agent skips." Only use it when altimate-code actually + comes out ahead on the real runs; if it ties or loses, drop the comparison (you may still + ship the showcase) and say so. An in-fork capability-disabled control is a fallback — see + value-wedge.md. + +Default to showcase. Reach for comparison only when the contrast is the point and it's real. + +## When to use / not use + +**Use it** when the user wants a demo and/or blog that shows an altimate-code capability on +a real use case. + +**Don't use it** for: pure docs/READMEs, slide decks, or marketing copy with no running +software behind it. Those don't need a recorded real run. + +## Inputs + +- A **problem statement** (e.g. *"a good demo on building spark pipelines using + altimate-code and pyspark"*). If missing, ask for it. +- Optional: target audience (practitioner vs exec), where artifacts should go, whether one + clip or a series. + +## Output layout + +Create `demos//` at the repo root: + +``` +demos// + research.md # phase 1: real pains + verbatim quotes + sources + demo-spec.md # phase 3: angles, fixtures, prompts, baseline controls + fixture/ # phase 4: the real use-case project the agent operates on + clips/ + .cast # asciinema recording of the REAL run + .gif # GIF rendered from that exact cast (agg) + .baseline.cast / .gif # ONLY in comparison mode + frames//*.png # extracted frames for visual inspection + runs/.json # --format json event stream (authenticity proof) + learnings.md # phase 6.5: what each session actually did + taught (mined from the trace) + blog.md # phase 7: the post, clips embedded + demo.html # phase 7.5: self-contained page published as a Claude artifact + REPLICATE.md # exact steps for a reader to reproduce it + MANIFEST.md # phase 9: index of everything + trace ids +``` + +The blog markdown is ALSO copied to the Obsidian vault per the user's global rule: +`/Users/anandgupta/obsidian/altimate/Research//.md`. + +## Workflow + +Track these as TaskCreate items so progress is visible. Each phase has a gate — don't +advance until it passes. + +### Phase 0 — Frame +- Slugify the topic. Decide single clip vs **series of 2–4** (default: series). +- Create the output dir. Write down the *claim* you intend to prove in one sentence. + +### Phase 1 — Deep research (grounded in real pain) +- Use the `deep-research` skill or `parallel:parallel-web-search` (fallback: WebSearch + + WebFetch). Mine Reddit (r/dataengineering etc.), Hacker News, Stack Overflow, eng blogs. +- For each pain capture: a label, **1–2 verbatim quotes (<25 words) with source URLs**, + a rough commonness signal, and a concrete scenario. +- Write `research.md`. **Gate:** at least one pain with a real, traceable quote. +- For a long/independent search, spawn a background `Agent` so you can author in parallel. + +### Phase 2 — Critique the research (honesty gate) +- For each pain, name the **specific** altimate-code capability that addresses it (a real + tool/skill — e.g. `sql_analyze`, `altimate_core_rewrite` w/ `verify_equivalence`, + dbt lineage/cost skills, `check`, warehouse tools). Read `references/value-wedge.md`. +- Drop any pain where we don't genuinely win, or where the help is really infra/ops, not a + terminal coding agent. Keep the **2–4 strongest "we win here"** angles. +- **Gate:** if nothing genuinely wins, STOP and report that honestly. Don't fabricate. + +### Phase 3 — Design the demo(s) +- For each angle write, in `demo-spec.md`: the real fixture, the exact prompt to + `altimate-code run`, the **value moment** (the single frame that makes the point), and the + **mode** — showcase (default) or comparison. Only choose comparison if a `claude -p` + control genuinely sharpens the point and altimate-code is expected to win; see + value-wedge.md. +- Keep each clip scoped to ONE moment. If it needs a paragraph to explain, split it. + +### Phase 4 — Build the real fixture +- A small but genuine project with a real bug/gap/decision the agent resolves. Not a toy + with the answer pre-written. For PySpark, `scripts/setup_pyspark_fixture.sh` bootstraps a + `local[*]` env (real execution, no cluster). +- **Gate:** the fixture runs and actually exhibits the problem before the agent touches it. + +### Phase 5 — Record (real runs) +- **Showcase mode:** `scripts/record.sh <topic> <angle> --cwd <fixture> --banner '<prompt>' + -- altimate-code run --yolo --trace '<prompt>'` records the real run with **asciinema** → + `.cast`, then renders to `.gif` with **agg**. Capture the json too: + `altimate-code run --format json … > runs/<angle>.json` (authenticity proof). `vhs` is the + optional scripted-polish fallback (see `references/recording.md`). +- **Comparison mode (optional):** also run `scripts/baseline_vs_altimate.sh <topic> <angle> + --cwd <fixture> --prompt '...' --reset-from <pristine>` to capture the `claude -p` control + on the same prompt/model, and record its clip too. +- **Gate:** `.cast` + `.gif` + `runs/<angle>.json` exist for the altimate run (plus the + baseline variant in comparison mode). + +### Phase 6 — Visual inspection (use your eyes) +- `scripts/inspect.sh <topic> <angle>` extracts key PNG frames. **Read them** and confirm: + legible at embed size; the value moment is visible; no stack-trace garbage or dead air; + timing is watchable. In comparison mode, also confirm the baseline genuinely struggles + where altimate wins. +- Cross-check the frames against `runs/<angle>.json` / `trace view` — nothing shown may be + absent from the real run. **Gate:** if any check fails, fix the fixture/tape and + re-record. Loop here, don't paper over it in the blog. +- **If you also build an HTML demo page or publish an artifact, RENDER IT and look** — + `chrome --headless=new --screenshot --window-size=W,H file://…`, then Read the PNG. + Inspecting the clips is not inspecting the page; layout bugs (empty grid cells, GIFs that + are mostly empty terminal) only show in the rendered page. See `quality-bar.md`. + +### Phase 6.5 — Learn from the session +- Run `scripts/extract_learnings.sh <topic> <angle>` to mine the recorded session + (`runs/<angle>.json` + the trace + the run log) into `learnings.md`: the tool/command + sequence altimate-code actually ran, the value moment(s), anything surprising (a defect it + caught, a bug it hit), and a one-line "why this matters." This is sourced from the REAL + run, so it's both demo material AND a check on law #2 — if the "value moment" isn't in the + extracted sequence, you don't have it. +- Two outputs from the learnings: + 1. **Into the blog** (phase 7): the "what it actually did" beat comes straight from here, + not from your memory of the run. + 2. **Back into the skill / memory:** if the session taught something reusable — a product + bug, an env gotcha, a prompt that reliably triggers the value behavior, a fixture + pattern that works — append it to `MANIFEST.md`'s ledger and, if it should persist + across sessions, save it to memory. Each demo run makes the next one sharper. + +### Phase 7 — Write the blog +- Invoke `blog-writer` (structure/voice) and `viral-tech-blog` (hook/title/shareability). + Use `templates/blog.md.tmpl` as the skeleton: open with the researched pain + a real + quote, then **show what altimate-code did** (sourced from `learnings.md`) with the clip, + and name the one capability that made it valuable. In comparison mode, add the + baseline → after contrast; in showcase mode, the value stands on its own. Close with + **Replicate it yourself** (exact commands — also write `REPLICATE.md`). +- Embed clips (GIF inline; link the `.cast` for copy-paste). Save `blog.md`, then copy to + the Obsidian vault path above. + +### Phase 7.5 — Publish as a Claude artifact +- **Load the `artifact-design` skill first** (required before authoring the page), and design + a real visual identity — don't ship the generic dark-terminal / mono-headline default. +- Author the page as an HTML template (the blog content + the clips as + `<img src="clips/<angle>.gif">`). Then make it self-contained: + `scripts/build_artifact.sh <topic> <template.html> --render` inlines every clip as a base64 + data URI (the artifact CSP blocks external hosts) → writes `demo.html` and renders a + screenshot. +- **RENDER-INSPECT before publishing** (this is mandatory, not optional — see the artifact + checklist in `quality-bar.md`): Read the rendered PNG and confirm no empty grid cells, no + mostly-empty GIFs, no horizontal scroll, legible type. Fix and rebuild until clean. +- Publish `demo.html` with the **Artifact** tool. On later edits, redeploy to the **same** + file path so it keeps the same URL; keep the favicon stable across redeploys. Record the + artifact URL in MANIFEST.md. + +### Phase 8 — Critique loop +- Run the `humanizer` skill (kill AI-slop) and self-critique against + `references/quality-bar.md`. Optionally run `consensus:review` on the draft. +- Loop back to the phase that failed (re-research, re-record, rewrite) until the bar is met + or the budget is spent. **State residual weaknesses honestly** in MANIFEST.md. + +### Phase 9 — Deliver +- Write `MANIFEST.md`: every clip + cast + frame dir, the blog (local + vault paths), + REPLICATE.md, research sources, and the **trace ids** for each recorded run (so anyone + can re-verify authenticity). Summarize what was proven and what wasn't. + +## References (read when you reach the relevant phase) +- `references/value-wedge.md` — finding the real "why us" and designing an honest baseline. +- `references/recording.md` — asciinema/agg/vhs mechanics, Wait-for-completion, frames. +- `references/quality-bar.md` — the gate rubric for the demo and the blog. + +## Reused skills (invoke, don't reimplement) +`deep-research` / `parallel:parallel-web-search` (phase 1), `blog-writer` + `viral-tech-blog` +(phase 7), `humanizer` + `consensus:review` (phase 8). diff --git a/.claude/skills/demo-creator/references/quality-bar.md b/.claude/skills/demo-creator/references/quality-bar.md new file mode 100644 index 000000000..ef85d2c2c --- /dev/null +++ b/.claude/skills/demo-creator/references/quality-bar.md @@ -0,0 +1,68 @@ +# Quality bar — the gate for demo + blog + +Score each item. If any **must-pass** fails, loop back and fix; do not ship. + +## The demo (clip) + +Must-pass (both modes): +- [ ] **Real.** The clip reconciles with the session trace / `--format json`. Nothing shown + is absent from the real run. +- [ ] **Value is shown.** altimate-code visibly *does* the valuable thing on camera (runs the + diff, flags the defect, proves equivalence) — not narrated, not merely claimed. It's in + `learnings.md`'s value-moment evidence, or you don't have it. +- [ ] **Legible.** Text is readable at the size it'll embed; no clipped lines. +- [ ] **One moment.** A single freezable frame makes the point. + +Must-pass (comparison mode only): +- [ ] **Fair baseline.** claude-code (`claude -p`), same model + prompt + fixture; the only + difference is altimate-code itself. Baseline failure is the *natural* result, not a + strawman (no sabotaged prompt). +- [ ] **altimate actually comes out ahead.** Visible, not narrated. If it ties or loses, drop + the comparison and ship showcase-only (or report it) — see value-wedge.md. + +Should-pass: +- [ ] ≤ ~45s of watchable content; dead air trimmed (idle compression only, no faking). +- [ ] The value moment lands in the first half of the clip. + +## The blog + +Must-pass (anti-slop — the `humanizer` skill enforces most of these): +- [ ] Opens with a **real pain + a real quote** (sourced), not "In today's data landscape…". +- [ ] **Shows, doesn't claim.** Showcase: the clip shows altimate-code doing the valuable + thing. Comparison: the baseline → after contrast is in the clip. Either way, demonstrated. +- [ ] Names the **specific capability** that made the difference (not "AI magic"). +- [ ] Has a **Replicate it yourself** section a reader could actually follow to the same + result (exact commands; matches REPLICATE.md / the fixture scripts). +- [ ] **Honest about limits.** Says where it wouldn't help / what the baseline got right. +- [ ] No feature-list dump, no rule-of-three filler, no "delve/leverage/seamless", minimal + em-dashes, no fabricated metrics. + +Should-pass (the `viral-tech-blog` layer): +- [ ] A title that states the surprising result, not the topic. +- [ ] One quotable stat or coined phrase a reader could repeat. +- [ ] A first line that hooks (the surprising claim or the pain), not a throat-clear. +- [ ] An ending that invites discussion (a question, a tradeoff, a "where this breaks"). + +## The rendered page / HTML artifact (if you build one) +Inspecting the clips is NOT inspecting the page. If you produce an HTML demo page or +publish an artifact, you MUST render the actual page and look at it: +- [ ] Render it (`chrome --headless=new --screenshot --window-size=W,H file://…`) and Read + the PNG. Don't trust that the markup "looks right" in source. +- [ ] No empty/dead layout cells — a common bug is an N-column grid with content in only + one cell (check every `grid-template-columns` has a child per column). +- [ ] Embedded GIFs aren't mostly empty space. A static screenshot shows GIF frame 1 + (often a near-empty terminal); that's an artifact of the capture, but ALSO re-render + the GIF at a row count that matches its real content so the live clip isn't 80% void + (e.g. `claude -p` print-mode output is tiny — crop it with `agg --rows`). +- [ ] Page body never scrolls sideways; wide blocks have their own `overflow-x:auto`. +- [ ] Self-contained: assets embedded as data URIs (artifact CSP blocks external hosts). + +## Series-level (when shipping multiple clips) +- [ ] Each clip is independent and proves a distinct capability. +- [ ] Together they tell one coherent "this is why altimate-code" story. +- [ ] No clip repeats another's value moment. + +## Honesty ledger (always write into MANIFEST.md) +- What was proven, with which trace ids. +- What was NOT proven / where altimate tied or lost. +- Any idle-trimming or edits applied to clips (and confirmation no output was fabricated). diff --git a/.claude/skills/demo-creator/references/recording.md b/.claude/skills/demo-creator/references/recording.md new file mode 100644 index 000000000..c5ac11354 --- /dev/null +++ b/.claude/skills/demo-creator/references/recording.md @@ -0,0 +1,73 @@ +# Recording mechanics: asciinema + agg (primary), vhs (fallback) + +## Why this stack + +- **asciinema** records the **real pty session** — including when you run it inside iTerm. + Output is a `.cast` (JSON-lines of timing + bytes): copy-pasteable, replayable, and a + faithful record of what actually happened. This is the authenticity anchor. +- **agg** renders a `.cast` to an animated `.gif`. Crucially it renders the **captured real + run**, so the GIF is the real session — not a re-execution. +- **vhs** is a *scripted* recorder: it re-runs the command in its own synthetic terminal. + Use it only when you want a perfectly clean, deterministic "polished" version and you've + already proven the real behavior with an asciinema cast. Never let vhs output be the only + evidence a thing worked. + +Install once: `brew install asciinema agg` (vhs already present: `brew install vhs`). + +## Primary path — record a real run + +`scripts/record.sh` wraps this. The essence: + +```bash +# Record the real command. asciinema runs it and captures everything. +asciinema rec --overwrite --command "<the real command>" clips/<angle>.cast +# Render that exact cast to a GIF. +agg --cols 100 --rows 30 --font-size 22 clips/<angle>.cast clips/<angle>.gif +``` + +Keep the terminal small (≈100×30) so the GIF is legible when embedded. Set a readable +font size. For long agent runs, prefer non-interactive `altimate-code run --yolo` so the +session ends on its own (asciinema stops when the command exits). + +### Handling long / nondeterministic agent runs +- Use `altimate-code run --yolo --max-turns N` so it can't run away. +- Trim dead air after the fact if needed: edit the `.cast` (it's just timed events) or use + `asciinema` quantize-style trimming; agg has `--idle-time-limit` to cap long pauses: + `agg --idle-time-limit 2 ...` collapses gaps >2s. This keeps clips watchable without + faking anything — it only compresses waiting, not output. + +## Fallback path — vhs (scripted polish) + +A `.tape` drives a synthetic terminal. Useful patterns (vhs ≥0.11 supports these): + +``` +Require altimate-code +Output clips/<angle>.gif +Set FontSize 22 +Set Width 1000 +Set Height 600 +Type "altimate-code run --yolo 'PROMPT'" Enter +Wait@180s /Done|✔|session ended/ # wait for a completion marker, not a fixed sleep +Screenshot clips/frames/<angle>/value-moment.png +Sleep 2s +``` + +`Wait@<timeout> /regex/` blocks until output matches — essential because agent runs vary in +length. `Screenshot` grabs a frame at that instant (no ffmpeg needed). Because vhs re-runs +the command, treat its GIF as illustrative and keep the asciinema cast as the real record. + +## Extracting frames for visual inspection (`scripts/inspect.sh`) + +From a GIF: +```bash +ffmpeg -i clips/<angle>.gif -vf "fps=1" clips/frames/<angle>/f_%03d.png # 1 frame/sec +``` +Then `Read` the PNGs to verify legibility and the value moment. Also dump a few from the +baseline GIF so you can eyeball the contrast side by side. + +## Authenticity cross-check +After recording, reconcile the clip with the real run: +- `altimate-code trace list` → find the session; `altimate-code trace view <id>`. +- Or read `runs/<angle>.json` (the `--format json` event stream). +The tools the clip appears to call, and the result it shows, must be present there. If the +GIF shows a verified-equivalence badge, the json/trace must contain that verification event. diff --git a/.claude/skills/demo-creator/references/value-wedge.md b/.claude/skills/demo-creator/references/value-wedge.md new file mode 100644 index 000000000..963d582fe --- /dev/null +++ b/.claude/skills/demo-creator/references/value-wedge.md @@ -0,0 +1,65 @@ +# Finding the wedge (and an optional, honest baseline) + +The whole demo hinges on one question: **what does altimate-code do here that's worth it — +and can we make a viewer SEE that in 20 seconds?** Sometimes the clearest way to show it is +altimate-code alone (showcase); sometimes it's a head-to-head with a generic agent +(comparison). Decide per angle; default to showcase. + +## Step 1 — Map pain → a specific capability + +A vague "it's an AI agent for data" is not a wedge. Tie each researched pain to a concrete, +named altimate-code capability that a generic agent lacks or does worse. Examples of real +capabilities to reach for (verify the tool/skill exists before you claim it): + +- **SQL/transform equivalence** — `altimate_core_rewrite` with `verify_equivalence`, + `altimate_core_equivalence`. Proves a rewrite didn't change results. Generic agents just + *assert* equivalence. +- **Deterministic SQL checks** — `altimate-code check` (lint / validate / safety / policy / + pii, no LLM). Catches issues a chat agent eyeballs and misses. +- **dbt-native understanding** — lineage, impact, cost-attribution, scheduling skills; + compiled-SQL awareness (not regex over Jinja). +- **Warehouse cost intelligence** — finops skills, dry-run cost, full-scan/spill detection. +- **Multi-warehouse schema awareness** — real schema introspection vs guessing column names. + +If the pain you found doesn't map to one of these (or another real, demonstrable +capability), it is **not a demo angle**. Cut it. A demo of a generic capability any agent +has tells the viewer nothing about why to choose us. + +## Step 2 — Pick the mode + +**Showcase (default).** Show altimate-code doing the task, and let the specific thing it +does carry the value: it runs the SQL and diffs the output, it runs `check` and flags a real +defect, it traces lineage and shows the blast radius, it proves equivalence. The viewer sees +the value directly. No second agent. Use this whenever the capability is self-evidently +useful on its own — which is most of the time. + +**Comparison (optional).** Add a **claude-code** (`claude -p`) control on the same +prompt/fixture/model only when the value *is* the contrast — "a general agent skips this +step; altimate-code doesn't." It answers "I already have Claude Code; why switch?" but it +costs you a fair-test burden (below) and only works if altimate-code actually comes out +ahead. If unsure, start showcase; add the comparison only if the contrast turns out real. + +If you do compare, keep it FAIR and ATTRIBUTABLE: same model, same prompt, same fixture, so +the only difference is altimate-code. Hold the model constant (`--model` / `--baseline-model`). +A fallback control is **capability-disabled in-fork** (`--baseline-mode disable --baseline-deny +<tool>`), useful to isolate one specific tool. Never give the baseline a worse model, vaguer +prompt, or broken env — a rigged contrast is worse than no contrast. + +## Step 3 — The honesty gate + +Watch the real run(s). Then: + +- **Showcase:** did altimate-code actually do the valuable thing on camera (the diff ran, the + defect was flagged, equivalence was proved)? If yes → ship. If it only *claimed* to, or + didn't get there → no demo yet; change the task or report honestly. +- **Comparison:** altimate clearly ahead → ship the contrast. Tie/marginal → drop the + comparison (you may still ship the showcase) and say so. Baseline wins/altimate regresses → + do not ship a contrast; report it — a real product finding beats a fake demo. + +## Step 4 — Name the value moment + +Every clip needs ONE frame you could freeze and put on a slide: equivalence VERIFIED; `check` +flagging a real defect; the SQL-vs-output `diff` coming back zero; lineage showing the blast +radius. In showcase mode that frame stands alone; in comparison mode it's the frame the +baseline never produces. Design the prompt so that moment happens on camera. If you can't +name the value moment, the demo isn't ready. diff --git a/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh b/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh new file mode 100755 index 000000000..fc67ed85c --- /dev/null +++ b/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh @@ -0,0 +1,80 @@ +#!/usr/bin/env bash +# Run the SAME task twice and capture each run for comparison: +# baseline = claude-code (the base agent: `claude -p`) [default] +# altimate = altimate-code (the fork with data tooling) +# Same prompt, same fixture, same underlying model -> the only difference is altimate-code's +# additions. This is the honest "why use us over plain Claude Code" control. +# +# It captures the json event stream for each run (authenticity proof). It does NOT render +# GIFs — use record.sh to capture the watchable clip of whichever run(s) you show. +# +# Usage: +# baseline_vs_altimate.sh <topic> <angle> --cwd DIR --prompt "..." \ +# [--reset-from PRISTINE_DIR] [--max-turns N] \ +# [--model PROVIDER/MODEL] # altimate-code model (e.g. anthropic/claude-...) +# [--baseline-model ALIAS] # claude-code model alias (default: sonnet) +# [--baseline-mode claude|disable] [--baseline-deny TOOL,TOOL] [--baseline-agent NAME] +set -euo pipefail +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" + +TOPIC=""; ANGLE=""; CWD="$PWD"; PROMPT=""; MODEL=""; BASE_MODEL="sonnet"; MAXT="40" +RESET_FROM=""; BASE_MODE="claude"; BASE_DENY=""; BASE_AGENT="" +while [ $# -gt 0 ]; do + case "$1" in + --cwd) CWD="$2"; shift 2;; + --prompt) PROMPT="$2"; shift 2;; + --model) MODEL="$2"; shift 2;; + --baseline-model) BASE_MODEL="$2"; shift 2;; + --reset-from) RESET_FROM="$2"; shift 2;; + --max-turns) MAXT="$2"; shift 2;; + --baseline-mode) BASE_MODE="$2"; shift 2;; # claude (default) | disable + --baseline-deny) BASE_DENY="$2"; shift 2;; # only for --baseline-mode disable + --baseline-agent) BASE_AGENT="$2"; shift 2;; # only for --baseline-mode disable + -*) echo "unknown opt: $1" >&2; exit 2;; + *) if [ -z "$TOPIC" ]; then TOPIC="$1"; elif [ -z "$ANGLE" ]; then ANGLE="$1"; fi; shift;; + esac +done +[ -n "$TOPIC" ] && [ -n "$ANGLE" ] && [ -n "$PROMPT" ] || { + echo "usage: baseline_vs_altimate.sh <topic> <angle> --cwd DIR --prompt '...' [...]" >&2; exit 2; } +need altimate-code + +DD="$(demo_dir "$TOPIC")"; RUNS="$DD/runs"; mkdir -p "$RUNS" + +reset_fixture() { + [ -n "$RESET_FROM" ] || return 0 + [ -d "$RESET_FROM" ] || { echo "ERROR: --reset-from '$RESET_FROM' is not a dir" >&2; return 1; } + log " resetting fixture: rsync $RESET_FROM/ -> $CWD/" + rsync -a --delete --exclude '.git' "$RESET_FROM"/ "$CWD"/ +} + +# ---------------- baseline ---------------- +log "=== BASELINE run ($BASE_MODE) ===" +reset_fixture +if [ "$BASE_MODE" = "claude" ]; then + need claude + ( cd "$CWD" && claude -p --dangerously-skip-permissions --max-turns "$MAXT" \ + --output-format json --model "$BASE_MODEL" "$PROMPT" ) \ + | tee "$RUNS/$ANGLE.baseline.json" || true +else + # In-fork isolation: run altimate-code but strip the capability under test. + B_ARGS=(); [ -n "$BASE_AGENT" ] && B_ARGS+=(--agent "$BASE_AGENT") + M_ARGS=(); [ -n "$MODEL" ] && M_ARGS+=(--model "$MODEL") + if [ -n "$BASE_DENY" ]; then + export OPENCODE_PERMISSION="{\"tools\":{$(echo "$BASE_DENY" | awk -F, '{for(i=1;i<=NF;i++){printf "%s\"%s\":\"deny\"", (i>1?",":""), $i}}')}}" + log " baseline OPENCODE_PERMISSION=$OPENCODE_PERMISSION" + fi + ( cd "$CWD" && altimate-code run --yolo --max-turns "$MAXT" --format json --trace \ + "${M_ARGS[@]}" "${B_ARGS[@]}" "$PROMPT" ) | tee "$RUNS/$ANGLE.baseline.json" || true + unset OPENCODE_PERMISSION +fi + +# ---------------- altimate ---------------- +log "=== ALTIMATE run ===" +reset_fixture +M_ARGS=(); [ -n "$MODEL" ] && M_ARGS+=(--model "$MODEL") +( cd "$CWD" && altimate-code run --yolo --max-turns "$MAXT" --format json --trace \ + "${M_ARGS[@]}" "$PROMPT" ) | tee "$RUNS/$ANGLE.json" || true + +log "json captured: $RUNS/$ANGLE.baseline.json + $RUNS/$ANGLE.json" +log "fair-comparison check: baseline model='$BASE_MODEL', altimate model='${MODEL:-<default>}' — keep these equivalent." +log "trace ids: 'altimate-code trace list' (altimate run is traced; the claude run leaves its own transcript)." diff --git a/.claude/skills/demo-creator/scripts/build_artifact.sh b/.claude/skills/demo-creator/scripts/build_artifact.sh new file mode 100755 index 000000000..23e90ff68 --- /dev/null +++ b/.claude/skills/demo-creator/scripts/build_artifact.sh @@ -0,0 +1,67 @@ +#!/usr/bin/env bash +# Build a SELF-CONTAINED HTML artifact from a page template by inlining every relative +# image src (clips/*.gif, frames/*.png, …) as a base64 data URI. Claude artifacts run under +# a strict CSP that blocks external hosts, so assets MUST be embedded. Optionally renders the +# result headless so you can visually inspect it BEFORE publishing (do not skip that). +# +# Usage: +# build_artifact.sh <topic> <template.html> [--out demo.html] [--render] [--width 820] [--height 5200] +# Then publish the written file with the Artifact tool (redeploy to the same path/URL on edits). +set -euo pipefail +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" + +[ $# -ge 2 ] || { echo "usage: build_artifact.sh <topic> <template.html> [--out NAME] [--render] [--width W] [--height H]" >&2; exit 2; } +TOPIC="$1"; TPL="$2"; shift 2 +OUT="demo.html"; RENDER=0; W=820; H=5200 +while [ $# -gt 0 ]; do case "$1" in + --out) OUT="$2"; shift 2;; + --render) RENDER=1; shift;; + --width) W="$2"; shift 2;; + --height) H="$2"; shift 2;; + *) echo "unknown opt: $1" >&2; exit 2;; +esac; done + +[ -f "$TPL" ] || { echo "ERROR: template '$TPL' not found" >&2; exit 1; } +DD="$(demo_dir "$TOPIC")"; OUTPATH="$DD/$OUT" +log "building self-contained artifact: $TPL -> $OUTPATH (base dir $DD)" + +python3 - "$TPL" "$DD" "$OUTPATH" <<'PY' +import base64, mimetypes, os, re, sys +tpl, base, out = sys.argv[1:4] +html = open(tpl, encoding="utf-8").read() +mimetypes.add_type("image/svg+xml", ".svg") +inlined = [0]; missing = [] +def datauri(rel): + p = os.path.normpath(os.path.join(base, rel)) + if not (p.startswith(os.path.abspath(base)) and os.path.isfile(p)): + missing.append(rel); return None + mime = mimetypes.guess_type(p)[0] or "application/octet-stream" + inlined[0] += 1 + return "data:%s;base64,%s" % (mime, base64.b64encode(open(p,"rb").read()).decode()) +def repl(m): + pre, q, url, post = m.group(1), m.group(2), m.group(3), m.group(4) + if re.match(r'^(https?:|data:|//)', url): return m.group(0) + d = datauri(url) + return m.group(0) if d is None else "%s%s%s%s" % (pre, q, d, post) +# src="..." and CSS url(...) for local image paths +html = re.sub(r'(src=)(["\'])([^"\']+\.(?:gif|png|jpe?g|svg|webp))(["\'])', repl, html, flags=re.I) +html = re.sub(r'(url\()(["\']?)([^)"\']+\.(?:gif|png|jpe?g|svg|webp))(["\']?\))', repl, html, flags=re.I) +open(out, "w", encoding="utf-8").write(html) +print("inlined %d asset(s); %.2f MB" % (inlined[0], len(html)/1e6)) +if missing: print("WARNING: could not inline (will break under CSP): %s" % ", ".join(missing), file=sys.stderr) +PY + +if [ "$RENDER" = "1" ]; then + CHROME="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" + SHOT="$DD/clips/frames/_artifact_render.png"; mkdir -p "$(dirname "$SHOT")" + if [ -x "$CHROME" ]; then + "$CHROME" --headless=new --disable-gpu --hide-scrollbars --force-device-scale-factor=1 \ + --window-size="$W,$H" --screenshot="$SHOT" "file://$OUTPATH" 2>/dev/null + log "rendered for inspection -> $SHOT (Read it; check empty grid cells, mostly-empty GIFs, no h-scroll)" + echo "$SHOT" + else + log "Chrome not found; render skipped. Open $OUTPATH manually to inspect before publishing." + fi +fi +log "artifact ready: $OUTPATH (publish with the Artifact tool; redeploy to the same path on edits)" +echo "$OUTPATH" diff --git a/.claude/skills/demo-creator/scripts/extract_learnings.sh b/.claude/skills/demo-creator/scripts/extract_learnings.sh new file mode 100755 index 000000000..0f33c5099 --- /dev/null +++ b/.claude/skills/demo-creator/scripts/extract_learnings.sh @@ -0,0 +1,96 @@ +#!/usr/bin/env bash +# Mine a recorded altimate-code session into learnings.md: what the run actually DID +# (tool/command sequence), the verification/value markers it hit, and raw material for the +# "what it did" beat in the blog + the cross-session learning ledger. +# +# It only extracts evidence from the real run — it does not invent takeaways. The agent +# reads learnings.md and writes the one-line "why this matters" + any reusable lesson. +# +# Usage: extract_learnings.sh <topic> <angle> [--baseline] (--baseline mines the claude run too) +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" +# Best-effort miner: grep no-matches and head-closed pipes are expected, so don't let +# lib.sh's `set -euo pipefail` abort us on them. +set +e +o pipefail + +[ $# -ge 2 ] || { echo "usage: extract_learnings.sh <topic> <angle> [--baseline]" >&2; exit 2; } +TOPIC="$1"; ANGLE="$2"; WITH_BASE=0 +[ "${3:-}" = "--baseline" ] && WITH_BASE=1 + +DD="$(demo_dir "$TOPIC")" +JSON="$DD/runs/$ANGLE.json" +LOG="$DD/runs/$ANGLE.log"; [ -f "$LOG" ] || LOG="$DD/runs/altimate.log" +OUT="$DD/learnings.md" + +# Pull the source the run actually produced. Prefer json; fall back to the captured log. +SRC="" +[ -f "$JSON" ] && SRC="$JSON" +[ -z "$SRC" ] && [ -f "$LOG" ] && SRC="$LOG" +[ -n "$SRC" ] || { echo "ERROR: no run json/log for $TOPIC/$ANGLE (runs/$ANGLE.json or runs/altimate.log)" >&2; exit 1; } +log "mining $SRC -> $OUT" + +# Strip ANSI escape codes so the mined evidence is clean (formatted logs are full of them). +CLEAN="$(mktemp)"; trap 'rm -f "$CLEAN"' EXIT +perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g' "$SRC" > "$CLEAN" 2>/dev/null || cp "$SRC" "$CLEAN" +SRC="$CLEAN" + +# Tool/command sequence the agent executed (bash tool-calls + tool names). +extract_tools() { # extract_tools <file> + grep -oE '"(tool|name)"[[:space:]]*:[[:space:]]*"[^"]+"' "$1" 2>/dev/null | sed -E 's/.*"([^"]+)"$/\1/' \ + | grep -viE '^(text|message|assistant|user|step)$' || true +} +# Shell commands captured in the recorded output (works on the formatted log too). +extract_cmds() { # extract_cmds <file> + grep -oE '(\$ |bash |python |diff |altimate-code |duckdb |run_sql)[^"]*' "$1" 2>/dev/null \ + | sed -E 's/\\u[0-9a-fA-F]{4}//g' | grep -vE '^\s*$' | head -40 || true +} +# Verification / value-moment markers — evidence the valuable thing happened on camera. +VALUE_MARKERS='run_sql|diff |byte-for-byte|identical|zero diff|verify|expected_sql|equival|sql_analyze|altimate_core|\bcheck\b|lineage|cost|row count|assert' + +TOOLS="$(extract_tools "$SRC" | sort | uniq -c | sort -rn | head -25)" +CMDS="$(extract_cmds "$SRC")" +VALUE_HITS="$(grep -niE "$VALUE_MARKERS" "$SRC" 2>/dev/null | sed -E 's/\\u[0-9a-fA-F]{4}//g' | head -25)" +TRACE_ID="$(grep -oE 'ses_[A-Za-z0-9]+' "$SRC" 2>/dev/null | head -1)" + +{ + echo "# Learnings — $TOPIC / $ANGLE" + echo + echo "_Mined from \`$(basename "$SRC")\`$( [ -n "$TRACE_ID" ] && echo " · trace \`$TRACE_ID\`")._" + echo "_All items below are extracted from the REAL run. The 'why it matters' and reusable" + echo "lessons are filled in by the author after reading this — do not invent value the run" + echo "did not show (law #2)._" + echo + echo "## What altimate-code actually ran" + echo '```' + [ -n "$CMDS" ] && echo "$CMDS" || echo "(no shell commands parsed — inspect $SRC directly)" + echo '```' + echo + echo "## Tools / events observed (count)" + echo '```' + [ -n "$TOOLS" ] && echo "$TOOLS" || echo "(none parsed)" + echo '```' + echo + echo "## Value-moment evidence (verification / capability markers)" + if [ -n "$VALUE_HITS" ]; then echo '```'; echo "$VALUE_HITS"; echo '```'; else + echo "**NONE FOUND.** The run shows no verification/capability marker — per law #2 you" + echo "do not yet have a value moment for this angle. Change the task or report honestly." + fi + echo + echo "## Why it matters <!-- author fills in: one line, grounded in the evidence above -->" + echo + echo "## Reusable lesson <!-- author: product bug / env gotcha / prompt that triggers the value / fixture pattern." + echo " If it should persist across sessions, also save to memory + MANIFEST ledger. -->" +} > "$OUT" + +if [ "$WITH_BASE" = "1" ] && [ -f "$DD/runs/$ANGLE.baseline.json" ]; then + { + echo + echo "## Baseline (claude-code) — value markers, for the contrast" + BH="$(grep -niE "$VALUE_MARKERS" "$DD/runs/$ANGLE.baseline.json" 2>/dev/null | head -20)" + if [ -n "$BH" ]; then echo '```'; echo "$BH"; echo '```'; else + echo "(baseline shows none of: $VALUE_MARKERS — this absence IS the contrast.)" + fi + } >> "$OUT" +fi + +log "wrote $OUT" +echo "$OUT" diff --git a/.claude/skills/demo-creator/scripts/inspect.sh b/.claude/skills/demo-creator/scripts/inspect.sh new file mode 100755 index 000000000..0717dcc5c --- /dev/null +++ b/.claude/skills/demo-creator/scripts/inspect.sh @@ -0,0 +1,43 @@ +#!/usr/bin/env bash +# Extract PNG frames from a clip's GIF for visual inspection, and print the authenticity +# cross-check inputs (json event summary + how to view the trace). +# Usage: inspect.sh <topic> <angle> [--fps N] [--prefix NAME] +set -euo pipefail +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" + +[ $# -ge 2 ] || { echo "usage: inspect.sh <topic> <angle> [--fps N] [--prefix NAME]" >&2; exit 2; } +TOPIC="$1"; ANGLE="$2"; shift 2 +FPS=1; PREFIX="$ANGLE" +while [ $# -gt 0 ]; do + case "$1" in + --fps) FPS="$2"; shift 2;; + --prefix) PREFIX="$2"; shift 2;; + *) echo "unknown opt: $1" >&2; exit 2;; + esac +done +need ffmpeg ffmpeg + +DD="$(demo_dir "$TOPIC")" +GIF="$DD/clips/$PREFIX.gif" +FRAMES="$DD/clips/frames/$PREFIX" +[ -f "$GIF" ] || { echo "ERROR: no GIF at $GIF (record it first)" >&2; exit 1; } +mkdir -p "$FRAMES"; rm -f "$FRAMES"/f_*.png + +log "extracting frames @ ${FPS}fps -> $FRAMES" +ffmpeg -y -loglevel error -i "$GIF" -vf "fps=$FPS" "$FRAMES/f_%03d.png" +COUNT=$(ls "$FRAMES"/f_*.png 2>/dev/null | wc -l | tr -d ' ') +log "wrote $COUNT frames. Read them to verify legibility + the value moment." + +# --- Authenticity cross-check pointers --- +JSON="$DD/runs/$ANGLE.json" +if [ -f "$JSON" ]; then + log "json event stream present: $JSON" + echo "----- tool calls observed in the run (cross-check against the clip) -----" + # Best-effort: pull tool/event names out of the json stream (works for line-delimited json). + grep -oE '"(tool|name|type)"[[:space:]]*:[[:space:]]*"[^"]+"' "$JSON" 2>/dev/null | sort | uniq -c | sort -rn | head -40 || true + echo "------------------------------------------------------------------------" +else + log "NOTE: no $JSON — run baseline_vs_altimate.sh (or run with --format json) to get the authenticity proof." +fi +echo "To view the recorded session trace: altimate-code trace list then altimate-code trace view <id>" +echo "FRAMES_DIR=$FRAMES" diff --git a/.claude/skills/demo-creator/scripts/lib.sh b/.claude/skills/demo-creator/scripts/lib.sh new file mode 100755 index 000000000..dfbe228bf --- /dev/null +++ b/.claude/skills/demo-creator/scripts/lib.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash +# Shared helpers for demo-creator scripts. Source this; don't run directly. +set -euo pipefail + +# Repo root = three levels up from .claude/skills/demo-creator/scripts/ +SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +REPO_ROOT="$(cd "$SKILL_DIR/../../.." && pwd)" + +demo_dir() { # demo_dir <topic-slug> + echo "$REPO_ROOT/demos/$1" +} + +need() { # need <cmd> [brew-pkg] + if ! command -v "$1" >/dev/null 2>&1; then + echo "ERROR: '$1' not found. Install: brew install ${2:-$1}" >&2 + return 1 + fi +} + +log() { printf '\033[36m[demo-creator]\033[0m %s\n' "$*" >&2; } diff --git a/.claude/skills/demo-creator/scripts/record.sh b/.claude/skills/demo-creator/scripts/record.sh new file mode 100755 index 000000000..53d2dc62b --- /dev/null +++ b/.claude/skills/demo-creator/scripts/record.sh @@ -0,0 +1,56 @@ +#!/usr/bin/env bash +# Record a REAL run with asciinema, then render that exact cast to a GIF with agg. +# Usage: +# record.sh <topic-slug> <angle> [--cwd DIR] [--cols N] [--rows N] [--font N] \ +# [--idle SECS] [--out-prefix NAME] -- <command...> +# Everything after `--` is the real command that gets recorded (e.g. altimate-code run ...). +# Produces: demos/<topic>/clips/<out-prefix>.cast and .gif +set -euo pipefail +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" + +[ $# -ge 2 ] || { echo "usage: record.sh <topic> <angle> [opts] -- <command...>" >&2; exit 2; } +TOPIC="$1"; ANGLE="$2"; shift 2 +CWD="$PWD"; COLS=100; ROWS=30; FONT=22; IDLE=2; PREFIX="$ANGLE"; BANNER="" +while [ $# -gt 0 ]; do + case "$1" in + --cwd) CWD="$2"; shift 2;; + --cols) COLS="$2"; shift 2;; + --rows) ROWS="$2"; shift 2;; + --font) FONT="$2"; shift 2;; + --idle) IDLE="$2"; shift 2;; + --out-prefix) PREFIX="$2"; shift 2;; + --banner) BANNER="$2"; shift 2;; # shown on screen before the run (headless hides the invocation) + --) shift; break;; + *) echo "unknown opt: $1" >&2; exit 2;; + esac +done +[ $# -ge 1 ] || { echo "ERROR: no command after --" >&2; exit 2; } + +need asciinema asciinema; need agg agg + +DD="$(demo_dir "$TOPIC")"; CLIPS="$DD/clips"; mkdir -p "$CLIPS" +CAST="$CLIPS/$PREFIX.cast"; GIF="$CLIPS/$PREFIX.gif" + +# Build a single shell string for the real command (asciinema --command takes one string). +CMD="$(printf '%q ' "$@")" +log "recording REAL run -> $CAST" +log " cwd: $CWD" +log " cmd: $CMD" + +# Optional on-screen banner so viewers see what was asked (headless mode hides the prompt). +# The real command is unchanged — the banner only echoes context, it does not fake output. +INNER="$CMD" +if [ -n "$BANNER" ]; then + INNER="printf '\033[1;36m\$ %s\033[0m\n\n' $(printf '%q' "$BANNER"); sleep 1; $CMD" +fi + +# Record. asciinema (v3) runs the command in a real pty and stops when it exits. +( cd "$CWD" && asciinema rec --overwrite --window-size "${COLS}x${ROWS}" \ + --command "bash -lc $(printf '%q' "$INNER")" "$CAST" ) + +log "rendering -> $GIF" +agg --cols "$COLS" --rows "$ROWS" --font-size "$FONT" --idle-time-limit "$IDLE" \ + "$CAST" "$GIF" + +log "done: $CAST + $GIF" +echo "$GIF" diff --git a/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh b/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh new file mode 100755 index 000000000..da47530dd --- /dev/null +++ b/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh @@ -0,0 +1,62 @@ +#!/usr/bin/env bash +# Bootstrap a REAL local PySpark environment (local[*], no cluster) for a demo fixture. +# Idempotent. Prints the env exports the demo run needs. +# Usage: setup_pyspark_fixture.sh <fixture-dir> +set -euo pipefail +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; source "$DIR/lib.sh" + +FIXTURE="${1:?usage: setup_pyspark_fixture.sh <fixture-dir>}" +mkdir -p "$FIXTURE"; FIXTURE="$(cd "$FIXTURE" && pwd)" +log "fixture dir: $FIXTURE" + +# --- JDK (Spark needs a JVM). We pin PySpark to the 3.5 line, which supports Java 8/11/17. +# Many brew `openjdk@N` aliases actually resolve to the latest JDK (e.g. 23), which Spark +# rejects, so we VALIDATE the reported major version and accept only 8/11/17 (then 21). --- +java_major() { "$1/bin/java" -version 2>&1 | sed -nE 's/.*version "([0-9]+).*/\1/p' | head -1; } +JAVA_HOME="" +for cand in \ + "/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home" \ + "/opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk/Contents/Home" \ + "/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home"; do + [ -x "$cand/bin/java" ] || continue + m="$(java_major "$cand")" + case "$m" in 8|11|17|21) JAVA_HOME="$cand"; log "using JDK $m at $cand"; break;; esac +done +if [ -z "$JAVA_HOME" ]; then + log "no Spark-3.5-compatible JDK (8/11/17/21) found; installing openjdk@17" + brew install openjdk@17 + JAVA_HOME="/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home" +fi +export JAVA_HOME +log "JAVA_HOME=$JAVA_HOME ($(java_major "$JAVA_HOME"))" + +# --- Python venv + pyspark (3.5 line) via uv (fast). Fallback to python3 -m venv. --- +VENV="$FIXTURE/.venv" +PYSPARK_SPEC='pyspark>=3.5,<4' +if command -v uv >/dev/null 2>&1; then + ( cd "$FIXTURE" && uv venv "$VENV" >/dev/null 2>&1 || true; \ + VIRTUAL_ENV="$VENV" uv pip install --python "$VENV/bin/python" "$PYSPARK_SPEC" ) +else + python3 -m venv "$VENV"; "$VENV/bin/pip" install -q --upgrade pip "$PYSPARK_SPEC" +fi +log "pyspark installed in $VENV" + +# --- Smoke test: prove Spark actually runs here. --- +log "smoke-testing Spark local[*]..." +JAVA_HOME="$JAVA_HOME" "$VENV/bin/python" - <<'PY' +from pyspark.sql import SparkSession +s = SparkSession.builder.master("local[*]").appName("smoke").getOrCreate() +n = s.range(5).count() +s.stop() +assert n == 5, n +print("SPARK_SMOKE_OK rows=%d" % n) +PY + +cat <<EOF + +# ---- add these to your shell / the demo run env ---- +export JAVA_HOME="$JAVA_HOME" +export PATH="\$JAVA_HOME/bin:$VENV/bin:\$PATH" +# run pipeline: $VENV/bin/python <script>.py +EOF +log "fixture env ready." diff --git a/.claude/skills/demo-creator/templates/blog.md.tmpl b/.claude/skills/demo-creator/templates/blog.md.tmpl new file mode 100644 index 000000000..14715cb81 --- /dev/null +++ b/.claude/skills/demo-creator/templates/blog.md.tmpl @@ -0,0 +1,52 @@ +# {{TITLE — state the surprising result, not the topic}} + +> {{One-line deck: the claim you proved, in plain words.}} + +{{HOOK: open on the real pain. Lead with a verbatim quote from research.md and its source. +A data engineer on r/dataengineering put it bluntly:}} + +> "{{REAL_QUOTE}}" +> — {{source, linked: URL}} + +{{2–4 sentences: why this pain is common and what it actually costs. No "in today's +landscape". Get to the point.}} + +## The setup + +{{Describe the real fixture in 2–3 sentences. Link it. Make clear it's a genuine task with +a real defect/decision, not a toy with the answer pre-written.}} + +## Baseline: plain Claude Code + +{{What happened when we ran the same prompt through claude-code (`claude -p`), same model. +Embed the baseline clip. Be fair — show what it got right too.}} + +![baseline]({{clips/<angle>.baseline.gif}}) + +{{The specific thing it missed / got wrong, stated concretely.}} + +## With altimate-code + +![altimate-code]({{clips/<angle>.gif}}) + +{{The value moment, in prose. Name the EXACT capability — e.g. "altimate_core_rewrite with +verify_equivalence proved the rewrite returns identical rows" — not "the AI figured it +out". One quotable stat or coined phrase here.}} + +[Replayable terminal cast]({{clips/<angle>.cast}}) · [session trace id: {{TRACE_ID}}] + +## Why it worked (and where it wouldn't) + +{{The mechanism, briefly. Then the honest limits: where altimate-code wouldn't help, what +the baseline got right, what you'd still check yourself.}} + +## Replicate it yourself + +{{Exact, copy-pasteable steps — must match REPLICATE.md and the fixture scripts. A reader +should reach the same result.}} + +```bash +{{commands}} +``` + +## {{Discussable ending — a tradeoff or a question, not a summary.}} diff --git a/.claude/skills/demo-creator/templates/demo.tape.tmpl b/.claude/skills/demo-creator/templates/demo.tape.tmpl new file mode 100644 index 000000000..730cf4edf --- /dev/null +++ b/.claude/skills/demo-creator/templates/demo.tape.tmpl @@ -0,0 +1,25 @@ +# vhs tape template (SCRIPTED FALLBACK only — primary recorder is asciinema+agg). +# Replace {{PLACEHOLDERS}}. See references/recording.md. +Require altimate-code + +Output {{OUT_GIF}} # e.g. clips/schema-drift.gif +Set Shell "bash" +Set FontSize 22 +Set Width 1000 +Set Height 600 +Set Padding 16 +Set Theme "Dracula" + +# Establish the fixture state on screen (optional, keep short). +Type "cd {{FIXTURE_DIR}}" Enter +Sleep 1s + +# The real run. --yolo so it completes unattended; --max-turns caps runaway. +Type "altimate-code run --yolo --max-turns {{MAX_TURNS}} '{{PROMPT}}'" Enter + +# Wait for a real completion marker rather than a fixed sleep. +Wait@{{TIMEOUT}}s /{{DONE_REGEX}}/ # e.g. 240s /VERIFIED|session ended|Done/ + +# Freeze the value moment. +Screenshot {{FRAMES_DIR}}/value-moment.png +Sleep 3s diff --git a/.claude/skills/demo-creator/templates/replicate.md.tmpl b/.claude/skills/demo-creator/templates/replicate.md.tmpl new file mode 100644 index 000000000..6a90ee538 --- /dev/null +++ b/.claude/skills/demo-creator/templates/replicate.md.tmpl @@ -0,0 +1,36 @@ +# Replicate this demo yourself + +Everything here is real and reproducible. Estimated time: {{N}} minutes. + +## Prerequisites +- altimate-code installed: `curl -fsSL {{INSTALL_URL}} | bash` (or `npm i -g altimate-code`) +- {{Any topic-specific deps, e.g. a JDK + pyspark, or duckdb}} +- An LLM provider configured: `altimate-code providers` (auth) + +## 1. Get the fixture +```bash +{{clone/copy the demos/<topic>/fixture, or the setup script}} +bash {{setup_script}} +``` + +## 2. Reproduce the baseline (plain Claude Code) +```bash +{{the exact baseline command — e.g. claude -p --dangerously-skip-permissions --model <m> '<prompt>'}} +``` +Expected: {{what claude-code does — the gap}}. + +## 3. Run altimate-code (capability on) +```bash +{{the exact altimate command}} +``` +Expected: {{the value moment — what to look for}}. + +## 4. Verify it yourself +```bash +{{how to confirm the result is real — e.g. run the pipeline / query the output / altimate-code check}} +altimate-code trace list # your run is recorded; view it with `trace view <id>` +``` + +## Notes / gotchas +- {{anything that commonly trips people up}} +- Docker fallback (hermetic): {{command, if applicable}} From 13772ba13d70b9d19135a34641531eb61ea6ddf3 Mon Sep 17 00:00:00 2001 From: anandgupta42 <anand@altimate.ai> Date: Tue, 30 Jun 2026 17:43:42 -0700 Subject: [PATCH 2/2] fix: [demo-creator] address PR review comments Harden the skill's scripts and docs per the PR #979 review (Codex, CodeRabbit, Kilo, cubic): - `lib.sh`: `demo_dir` now rejects non-slug topics (`..`, `/`) so paths can't escape `demos/`. - `baseline_vs_altimate.sh`: guard empty-array expansion for bash 3.2 (`${arr[@]+"${arr[@]}"}`); fix `OPENCODE_PERMISSION` to top-level tool keys (`{"bash":"deny"}`); warn loudly when `--reset-from` or `--model` is omitted (contaminated fixture / mismatched model invalidate the comparison); fail the script when the altimate run fails instead of `|| true`. - `record.sh`: add `asciinema --return` and propagate a failed recorded command so a broken run isn't rendered and passed off as a good clip. - `build_artifact.sh`: fix path-containment (realpath + `os.path.commonpath`, not string prefix); discover a Chrome/Chromium on PATH and hard-fail `--render` if none (render-inspection is mandatory). - `inspect.sh`: cross-check a `.baseline` clip against the baseline JSON, and count frames with `find` so a no-match doesn't trip `set -e`. - `setup_pyspark_fixture.sh`: honor an existing `JAVA_HOME`/`java`/`java_home` before Homebrew (works on Linux / Intel mac); robust major-version parse for `1.8`-style strings; validate prerequisites. - `extract_learnings.sh`: display the real source filename, not the temp copy. - `SKILL.md`: authenticity must come from the SAME recorded run (its `--trace`), not a second `--format json` run; add a language tag to the tree code block. - `blog.md.tmpl`: mark the baseline section comparison-only (showcase is default). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --- .claude/skills/demo-creator/SKILL.md | 17 +++-- .../scripts/baseline_vs_altimate.sh | 30 +++++++-- .../demo-creator/scripts/build_artifact.sh | 34 +++++++--- .../demo-creator/scripts/extract_learnings.sh | 11 +++- .../skills/demo-creator/scripts/inspect.sh | 10 ++- .claude/skills/demo-creator/scripts/lib.sh | 8 ++- .claude/skills/demo-creator/scripts/record.sh | 16 ++++- .../scripts/setup_pyspark_fixture.sh | 65 ++++++++++++++----- .../demo-creator/templates/blog.md.tmpl | 5 +- 9 files changed, 148 insertions(+), 48 deletions(-) diff --git a/.claude/skills/demo-creator/SKILL.md b/.claude/skills/demo-creator/SKILL.md index c4898f3e4..40d6584b5 100644 --- a/.claude/skills/demo-creator/SKILL.md +++ b/.claude/skills/demo-creator/SKILL.md @@ -89,7 +89,7 @@ software behind it. Those don't need a recorded real run. Create `demos/<topic-slug>/` at the repo root: -``` +```text demos/<topic-slug>/ research.md # phase 1: real pains + verbatim quotes + sources demo-spec.md # phase 3: angles, fixtures, prompts, baseline controls @@ -152,14 +152,19 @@ advance until it passes. ### Phase 5 — Record (real runs) - **Showcase mode:** `scripts/record.sh <topic> <angle> --cwd <fixture> --banner '<prompt>' -- altimate-code run --yolo --trace '<prompt>'` records the real run with **asciinema** → - `.cast`, then renders to `.gif` with **agg**. Capture the json too: - `altimate-code run --format json … > runs/<angle>.json` (authenticity proof). `vhs` is the - optional scripted-polish fallback (see `references/recording.md`). + `.cast`, then renders to `.gif` with **agg**. `vhs` is the optional scripted-polish + fallback (see `references/recording.md`). +- **Authenticity comes from the SAME recorded run, not a re-run.** The clip's proof is that + `.cast` PLUS the `--trace` written by *that* invocation (grab its trace id from the cast / + `altimate-code trace list`). Do NOT run `altimate-code run --format json` a second time to + "get the json" — a fresh agent run is nondeterministic and would authenticate a different + session than the one on screen. (In comparison mode, `baseline_vs_altimate.sh` captures + `--format json` from the very run it records, which is fine — one run, one artifact.) - **Comparison mode (optional):** also run `scripts/baseline_vs_altimate.sh <topic> <angle> --cwd <fixture> --prompt '...' --reset-from <pristine>` to capture the `claude -p` control on the same prompt/model, and record its clip too. -- **Gate:** `.cast` + `.gif` + `runs/<angle>.json` exist for the altimate run (plus the - baseline variant in comparison mode). +- **Gate:** `.cast` + `.gif` (+ the recorded run's trace id) exist for the altimate run (plus + the baseline variant in comparison mode). ### Phase 6 — Visual inspection (use your eyes) - `scripts/inspect.sh <topic> <angle>` extracts key PNG frames. **Read them** and confirm: diff --git a/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh b/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh index fc67ed85c..722721586 100755 --- a/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh +++ b/.claude/skills/demo-creator/scripts/baseline_vs_altimate.sh @@ -40,6 +40,19 @@ need altimate-code DD="$(demo_dir "$TOPIC")"; RUNS="$DD/runs"; mkdir -p "$RUNS" +# The comparison is only fair if BOTH runs start from the identical fixture. Without a +# pristine reset, the baseline run's edits leak into the altimate run's starting state. +if [ -z "$RESET_FROM" ]; then + log "WARNING: no --reset-from given. The baseline run can mutate $CWD, so the altimate run" + log " would start from contaminated state. Pass --reset-from <pristine> for a valid comparison." +fi +# Model control: baseline defaults to '$BASE_MODEL'; if altimate --model is unset it uses the +# configured default, which may differ. Warn so a win/loss can't come from model selection. +if [ -z "$MODEL" ]; then + log "WARNING: no --model for the altimate run; it will use the configured default. Pass --model" + log " so it matches --baseline-model ('$BASE_MODEL') and the only variable is altimate-code." +fi + reset_fixture() { [ -n "$RESET_FROM" ] || return 0 [ -d "$RESET_FROM" ] || { echo "ERROR: --reset-from '$RESET_FROM' is not a dir" >&2; return 1; } @@ -48,6 +61,7 @@ reset_fixture() { } # ---------------- baseline ---------------- +# The baseline is EXPECTED to sometimes struggle, so its failure doesn't fail the script. log "=== BASELINE run ($BASE_MODE) ===" reset_fixture if [ "$BASE_MODE" = "claude" ]; then @@ -56,24 +70,30 @@ if [ "$BASE_MODE" = "claude" ]; then --output-format json --model "$BASE_MODEL" "$PROMPT" ) \ | tee "$RUNS/$ANGLE.baseline.json" || true else - # In-fork isolation: run altimate-code but strip the capability under test. + # In-fork isolation: run altimate-code but strip the capability under test. Permission keys + # are top-level tool names (see config.ts permission schema), e.g. {"bash":"deny"}. B_ARGS=(); [ -n "$BASE_AGENT" ] && B_ARGS+=(--agent "$BASE_AGENT") M_ARGS=(); [ -n "$MODEL" ] && M_ARGS+=(--model "$MODEL") if [ -n "$BASE_DENY" ]; then - export OPENCODE_PERMISSION="{\"tools\":{$(echo "$BASE_DENY" | awk -F, '{for(i=1;i<=NF;i++){printf "%s\"%s\":\"deny\"", (i>1?",":""), $i}}')}}" + export OPENCODE_PERMISSION="{$(echo "$BASE_DENY" | awk -F, '{for(i=1;i<=NF;i++){printf "%s\"%s\":\"deny\"", (i>1?",":""), $i}}')}" log " baseline OPENCODE_PERMISSION=$OPENCODE_PERMISSION" fi ( cd "$CWD" && altimate-code run --yolo --max-turns "$MAXT" --format json --trace \ - "${M_ARGS[@]}" "${B_ARGS[@]}" "$PROMPT" ) | tee "$RUNS/$ANGLE.baseline.json" || true + ${M_ARGS[@]+"${M_ARGS[@]}"} ${B_ARGS[@]+"${B_ARGS[@]}"} "$PROMPT" ) | tee "$RUNS/$ANGLE.baseline.json" || true unset OPENCODE_PERMISSION fi # ---------------- altimate ---------------- +# This run is the evidence — if it doesn't complete, the comparison is invalid, so fail loudly. log "=== ALTIMATE run ===" reset_fixture M_ARGS=(); [ -n "$MODEL" ] && M_ARGS+=(--model "$MODEL") -( cd "$CWD" && altimate-code run --yolo --max-turns "$MAXT" --format json --trace \ - "${M_ARGS[@]}" "$PROMPT" ) | tee "$RUNS/$ANGLE.json" || true +set -o pipefail +if ! ( cd "$CWD" && altimate-code run --yolo --max-turns "$MAXT" --format json --trace \ + ${M_ARGS[@]+"${M_ARGS[@]}"} "$PROMPT" ) | tee "$RUNS/$ANGLE.json"; then + echo "ERROR: the altimate-code run failed — comparison evidence is invalid. Not reporting success." >&2 + exit 1 +fi log "json captured: $RUNS/$ANGLE.baseline.json + $RUNS/$ANGLE.json" log "fair-comparison check: baseline model='$BASE_MODEL', altimate model='${MODEL:-<default>}' — keep these equivalent." diff --git a/.claude/skills/demo-creator/scripts/build_artifact.sh b/.claude/skills/demo-creator/scripts/build_artifact.sh index 23e90ff68..6fcb46b6d 100755 --- a/.claude/skills/demo-creator/scripts/build_artifact.sh +++ b/.claude/skills/demo-creator/scripts/build_artifact.sh @@ -31,9 +31,12 @@ tpl, base, out = sys.argv[1:4] html = open(tpl, encoding="utf-8").read() mimetypes.add_type("image/svg+xml", ".svg") inlined = [0]; missing = [] +base_real = os.path.realpath(base) def datauri(rel): - p = os.path.normpath(os.path.join(base, rel)) - if not (p.startswith(os.path.abspath(base)) and os.path.isfile(p)): + # realpath resolves symlinks; commonpath confines to base (a plain startswith prefix + # would let a sibling dir sharing base's name — e.g. base/../base-secret — escape). + p = os.path.realpath(os.path.join(base, rel)) + if not (os.path.isfile(p) and os.path.commonpath([p, base_real]) == base_real): missing.append(rel); return None mime = mimetypes.guess_type(p)[0] or "application/octet-stream" inlined[0] += 1 @@ -52,16 +55,27 @@ if missing: print("WARNING: could not inline (will break under CSP): %s" % ", ". PY if [ "$RENDER" = "1" ]; then - CHROME="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" + # Discover a Chromium-family browser on PATH or in the usual macOS bundle locations. + CHROME="" + for c in "${CHROME_BIN:-}" google-chrome google-chrome-stable chromium chromium-browser chrome \ + "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + "/Applications/Chromium.app/Contents/MacOS/Chromium" \ + "/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge"; do + [ -n "$c" ] || continue + if command -v "$c" >/dev/null 2>&1; then CHROME="$(command -v "$c")"; break; fi + [ -x "$c" ] && { CHROME="$c"; break; } + done SHOT="$DD/clips/frames/_artifact_render.png"; mkdir -p "$(dirname "$SHOT")" - if [ -x "$CHROME" ]; then - "$CHROME" --headless=new --disable-gpu --hide-scrollbars --force-device-scale-factor=1 \ - --window-size="$W,$H" --screenshot="$SHOT" "file://$OUTPATH" 2>/dev/null - log "rendered for inspection -> $SHOT (Read it; check empty grid cells, mostly-empty GIFs, no h-scroll)" - echo "$SHOT" - else - log "Chrome not found; render skipped. Open $OUTPATH manually to inspect before publishing." + if [ -z "$CHROME" ]; then + # Render-inspection is mandatory before publishing, so a missing browser is a hard error. + echo "ERROR: --render requested but no Chrome/Chromium found. Install one or set CHROME_BIN." >&2 + echo " (Do NOT publish an artifact you haven't visually inspected.)" >&2 + exit 1 fi + "$CHROME" --headless=new --disable-gpu --hide-scrollbars --force-device-scale-factor=1 \ + --window-size="$W,$H" --screenshot="$SHOT" "file://$OUTPATH" 2>/dev/null + log "rendered for inspection -> $SHOT (Read it; check empty grid cells, mostly-empty GIFs, no h-scroll)" + echo "$SHOT" fi log "artifact ready: $OUTPATH (publish with the Artifact tool; redeploy to the same path on edits)" echo "$OUTPATH" diff --git a/.claude/skills/demo-creator/scripts/extract_learnings.sh b/.claude/skills/demo-creator/scripts/extract_learnings.sh index 0f33c5099..432db8dea 100755 --- a/.claude/skills/demo-creator/scripts/extract_learnings.sh +++ b/.claude/skills/demo-creator/scripts/extract_learnings.sh @@ -22,13 +22,18 @@ LOG="$DD/runs/$ANGLE.log"; [ -f "$LOG" ] || LOG="$DD/runs/altimate.log" OUT="$DD/learnings.md" # Pull the source the run actually produced. Prefer json; fall back to the captured log. +CAST="$DD/clips/$ANGLE.cast" SRC="" [ -f "$JSON" ] && SRC="$JSON" [ -z "$SRC" ] && [ -f "$LOG" ] && SRC="$LOG" -[ -n "$SRC" ] || { echo "ERROR: no run json/log for $TOPIC/$ANGLE (runs/$ANGLE.json or runs/altimate.log)" >&2; exit 1; } +# Fall back to the recorded asciinema cast — it IS the run record (JSONL; output text is greppable). +[ -z "$SRC" ] && [ -f "$CAST" ] && SRC="$CAST" +[ -n "$SRC" ] || { echo "ERROR: no run source for $TOPIC/$ANGLE (runs/$ANGLE.json, runs/altimate.log, or clips/$ANGLE.cast)" >&2; exit 1; } log "mining $SRC -> $OUT" -# Strip ANSI escape codes so the mined evidence is clean (formatted logs are full of them). +# Remember the real source for display; mine from an ANSI-stripped copy (formatted logs are +# full of escape codes). SRC below points at the temp copy, SRC_NAME at the real file. +SRC_NAME="$(basename "$SRC")" CLEAN="$(mktemp)"; trap 'rm -f "$CLEAN"' EXIT perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g' "$SRC" > "$CLEAN" 2>/dev/null || cp "$SRC" "$CLEAN" SRC="$CLEAN" @@ -54,7 +59,7 @@ TRACE_ID="$(grep -oE 'ses_[A-Za-z0-9]+' "$SRC" 2>/dev/null | head -1)" { echo "# Learnings — $TOPIC / $ANGLE" echo - echo "_Mined from \`$(basename "$SRC")\`$( [ -n "$TRACE_ID" ] && echo " · trace \`$TRACE_ID\`")._" + echo "_Mined from \`$SRC_NAME\`$( [ -n "$TRACE_ID" ] && echo " · trace \`$TRACE_ID\`")._" echo "_All items below are extracted from the REAL run. The 'why it matters' and reusable" echo "lessons are filled in by the author after reading this — do not invent value the run" echo "did not show (law #2)._" diff --git a/.claude/skills/demo-creator/scripts/inspect.sh b/.claude/skills/demo-creator/scripts/inspect.sh index 0717dcc5c..a5a6e4f5f 100755 --- a/.claude/skills/demo-creator/scripts/inspect.sh +++ b/.claude/skills/demo-creator/scripts/inspect.sh @@ -25,11 +25,17 @@ mkdir -p "$FRAMES"; rm -f "$FRAMES"/f_*.png log "extracting frames @ ${FPS}fps -> $FRAMES" ffmpeg -y -loglevel error -i "$GIF" -vf "fps=$FPS" "$FRAMES/f_%03d.png" -COUNT=$(ls "$FRAMES"/f_*.png 2>/dev/null | wc -l | tr -d ' ') +# Count without letting a no-match ls abort us under `set -e`/pipefail (lib.sh sets both). +COUNT=$(find "$FRAMES" -name 'f_*.png' 2>/dev/null | wc -l | tr -d ' ') log "wrote $COUNT frames. Read them to verify legibility + the value moment." # --- Authenticity cross-check pointers --- -JSON="$DD/runs/$ANGLE.json" +# Reconcile against the run that produced THIS clip: a baseline prefix (<angle>.baseline) +# must be checked against the baseline json, not the altimate one. +case "$PREFIX" in + *.baseline) JSON="$DD/runs/$ANGLE.baseline.json";; + *) JSON="$DD/runs/$ANGLE.json";; +esac if [ -f "$JSON" ]; then log "json event stream present: $JSON" echo "----- tool calls observed in the run (cross-check against the clip) -----" diff --git a/.claude/skills/demo-creator/scripts/lib.sh b/.claude/skills/demo-creator/scripts/lib.sh index dfbe228bf..a208f5077 100755 --- a/.claude/skills/demo-creator/scripts/lib.sh +++ b/.claude/skills/demo-creator/scripts/lib.sh @@ -6,7 +6,13 @@ set -euo pipefail SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" REPO_ROOT="$(cd "$SKILL_DIR/../../.." && pwd)" -demo_dir() { # demo_dir <topic-slug> +demo_dir() { # demo_dir <topic-slug> — reject anything that isn't a plain slug so the + # path can't escape demos/ (a `..` or `/` in the slug would). + case "${1:-}" in + ''|*[!a-zA-Z0-9._-]*|*..*) + echo "demo_dir: invalid topic slug '${1:-}' (use letters, digits, '.', '_', '-')" >&2 + return 1;; + esac echo "$REPO_ROOT/demos/$1" } diff --git a/.claude/skills/demo-creator/scripts/record.sh b/.claude/skills/demo-creator/scripts/record.sh index 53d2dc62b..ad4dbaf38 100755 --- a/.claude/skills/demo-creator/scripts/record.sh +++ b/.claude/skills/demo-creator/scripts/record.sh @@ -28,6 +28,10 @@ done need asciinema asciinema; need agg agg +# asciinema (Rust) rejects args that aren't valid UTF-8 under a non-UTF-8 locale. Force one +# so prompts/banners with non-ASCII (em dashes, arrows, accents) don't error out. +export LC_ALL="${LC_ALL:-en_US.UTF-8}" LANG="${LANG:-en_US.UTF-8}" + DD="$(demo_dir "$TOPIC")"; CLIPS="$DD/clips"; mkdir -p "$CLIPS" CAST="$CLIPS/$PREFIX.cast"; GIF="$CLIPS/$PREFIX.gif" @@ -45,12 +49,20 @@ if [ -n "$BANNER" ]; then fi # Record. asciinema (v3) runs the command in a real pty and stops when it exits. -( cd "$CWD" && asciinema rec --overwrite --window-size "${COLS}x${ROWS}" \ - --command "bash -lc $(printf '%q' "$INNER")" "$CAST" ) +# --return makes asciinema exit with the recorded command's status (default is always 0), +# so a failed `altimate-code run` doesn't get rendered and passed off as a good demo. +rec_rc=0 +( cd "$CWD" && asciinema rec --overwrite --return --window-size "${COLS}x${ROWS}" \ + --command "bash -lc $(printf '%q' "$INNER")" "$CAST" ) || rec_rc=$? log "rendering -> $GIF" agg --cols "$COLS" --rows "$ROWS" --font-size "$FONT" --idle-time-limit "$IDLE" \ "$CAST" "$GIF" +if [ "$rec_rc" -ne 0 ]; then + echo "ERROR: the recorded command exited $rec_rc — this clip captured a FAILED run." >&2 + echo " Cast+GIF kept for debugging ($CAST), but do NOT ship this clip. Fix and re-record." >&2 + exit "$rec_rc" +fi log "done: $CAST + $GIF" echo "$GIF" diff --git a/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh b/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh index da47530dd..c16cc8d7f 100755 --- a/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh +++ b/.claude/skills/demo-creator/scripts/setup_pyspark_fixture.sh @@ -9,27 +9,56 @@ FIXTURE="${1:?usage: setup_pyspark_fixture.sh <fixture-dir>}" mkdir -p "$FIXTURE"; FIXTURE="$(cd "$FIXTURE" && pwd)" log "fixture dir: $FIXTURE" -# --- JDK (Spark needs a JVM). We pin PySpark to the 3.5 line, which supports Java 8/11/17. -# Many brew `openjdk@N` aliases actually resolve to the latest JDK (e.g. 23), which Spark -# rejects, so we VALIDATE the reported major version and accept only 8/11/17 (then 21). --- -java_major() { "$1/bin/java" -version 2>&1 | sed -nE 's/.*version "([0-9]+).*/\1/p' | head -1; } -JAVA_HOME="" -for cand in \ - "/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home" \ - "/opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk/Contents/Home" \ - "/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home"; do - [ -x "$cand/bin/java" ] || continue - m="$(java_major "$cand")" - case "$m" in 8|11|17|21) JAVA_HOME="$cand"; log "using JDK $m at $cand"; break;; esac -done -if [ -z "$JAVA_HOME" ]; then - log "no Spark-3.5-compatible JDK (8/11/17/21) found; installing openjdk@17" - brew install openjdk@17 - JAVA_HOME="/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home" +# --- JDK (Spark needs a JVM). We pin PySpark to the 3.5 line, which supports Java 8/11/17 +# (and 21). We check, in order: an existing JAVA_HOME, a `java` on PATH, `/usr/libexec/ +# java_home` (macOS), then Homebrew kegs — accepting only a Spark-compatible major +# version. Homebrew install is the LAST resort, so this works on Linux / Intel mac / +# already-configured shells, not just Apple-Silicon Homebrew. --- +# Handles both "1.8.0_xxx" (major 8) and "17.0.1" (major 17) version strings. +java_major() { + "$1/bin/java" -version 2>&1 | sed -nE 's/.*version "([0-9]+)(\.([0-9]+))?.*/\1 \3/p' | head -1 \ + | awk '{ if ($1==1) print $2; else print $1 }' +} +compat() { case "$1" in 8|11|17|21) return 0;; *) return 1;; esac; } + +JAVA_HOME_FOUND="" +add_cand() { # try a candidate home; keep it if java is executable AND version is compatible + [ -n "$JAVA_HOME_FOUND" ] && return 0 + [ -n "${1:-}" ] && [ -x "$1/bin/java" ] || return 0 + local m; m="$(java_major "$1")" + if compat "$m"; then JAVA_HOME_FOUND="$1"; log "using JDK $m at $1"; fi +} +# 1) caller's JAVA_HOME 2) java already on PATH 3) macOS java_home for 17/11/8 +add_cand "${JAVA_HOME:-}" +if [ -z "$JAVA_HOME_FOUND" ] && command -v java >/dev/null 2>&1; then + add_cand "$(cd "$(dirname "$(command -v java)")/.." && pwd)" +fi +if [ -z "$JAVA_HOME_FOUND" ] && [ -x /usr/libexec/java_home ]; then + for v in 17 11 21 8; do add_cand "$(/usr/libexec/java_home -v "$v" 2>/dev/null)"; done +fi +# 4) Homebrew kegs (Apple Silicon + Intel prefixes) +if [ -z "$JAVA_HOME_FOUND" ]; then + for pfx in /opt/homebrew/opt /usr/local/opt; do + for v in 17 11 21; do add_cand "$pfx/openjdk@$v/libexec/openjdk.jdk/Contents/Home"; done + done fi -export JAVA_HOME +# 5) last resort: install via Homebrew if available +if [ -z "$JAVA_HOME_FOUND" ]; then + if command -v brew >/dev/null 2>&1; then + log "no Spark-compatible JDK (8/11/17/21) found; installing openjdk@17 via Homebrew" + brew install openjdk@17 + add_cand "$(brew --prefix openjdk@17)/libexec/openjdk.jdk/Contents/Home" + add_cand "$(brew --prefix openjdk@17)" + fi +fi +[ -n "$JAVA_HOME_FOUND" ] || { echo "ERROR: no Spark-compatible JDK (Java 8/11/17/21) found. Install one and/or set JAVA_HOME." >&2; exit 1; } +export JAVA_HOME="$JAVA_HOME_FOUND" log "JAVA_HOME=$JAVA_HOME ($(java_major "$JAVA_HOME"))" +# Prerequisites for the venv step. +command -v uv >/dev/null 2>&1 || command -v python3 >/dev/null 2>&1 || { + echo "ERROR: need 'uv' or 'python3' to create the pyspark venv." >&2; exit 1; } + # --- Python venv + pyspark (3.5 line) via uv (fast). Fallback to python3 -m venv. --- VENV="$FIXTURE/.venv" PYSPARK_SPEC='pyspark>=3.5,<4' diff --git a/.claude/skills/demo-creator/templates/blog.md.tmpl b/.claude/skills/demo-creator/templates/blog.md.tmpl index 14715cb81..adc061566 100644 --- a/.claude/skills/demo-creator/templates/blog.md.tmpl +++ b/.claude/skills/demo-creator/templates/blog.md.tmpl @@ -16,6 +16,8 @@ landscape". Get to the point.}} {{Describe the real fixture in 2–3 sentences. Link it. Make clear it's a genuine task with a real defect/decision, not a toy with the answer pre-written.}} +<!-- COMPARISON MODE ONLY — delete this whole section in showcase mode (the default), which + produces no baseline clip. Keep it only if you ran baseline_vs_altimate.sh. --> ## Baseline: plain Claude Code {{What happened when we ran the same prompt through claude-code (`claude -p`), same model. @@ -24,8 +26,9 @@ Embed the baseline clip. Be fair — show what it got right too.}} ![baseline]({{clips/<angle>.baseline.gif}}) {{The specific thing it missed / got wrong, stated concretely.}} +<!-- END comparison-only section --> -## With altimate-code +## {{Showcase: "What altimate-code did" · Comparison: "With altimate-code"}} ![altimate-code]({{clips/<angle>.gif}})