A focused reimplementation of the Ralph
autonomous coding loop in Rust. Slimmer than the original bash version, with
pluggable agents (codex, claude, or mock), per-iteration jj commits,
and in-place execution (no branch switching).
The binary is named riveter (the tool); the methodology it implements is
Ralph (one focused story per fresh agent context, persistent state on disk,
jj as the review surface).
- Fresh context per story. Each iteration spawns a new agent process with no memory of prior iterations. The agent's working set is exactly one user story + the codebase + the progress log — never the full transcript of past attempts. This keeps the context window small and focused, which is the single biggest lever on agent quality and cost.
- Persistent state lives on disk, not in-context.
prd.tomltracks which stories are done — the agent reads it (plus its assigned story) at the start of every run and flipspasses = truewhen the story's acceptance criteria are met. jjis the review surface. Because every iteration is onejjcommit on@, you review Riveter's work the same way you review your own:jj log,jj diff -r <id>,jj abandon <id>to reject,jj squash/jj splitto reshape. No branch dance, no PR ceremony, no merge conflicts with your in-flight work.- Pluggable agents. Codex and Claude are interchangeable — pick the model that's good at the kind of work the spec needs. Same loop, same artifacts, same review flow.
Riveter follows the Ralph methodology faithfully, but a handful of design choices were deliberately changed from the original Ralph bash implementation:
-
No
progress.txtstyle early-termination signal. Ralph greps the last 20 lines of agent output for<promise>COMPLETE</promise>and exits the loop on match. That's brittle: agents fail early, get rate-limited, or print the sentinel as part of their own instruction echo, and the loop exits prematurely. Riveter terminates only when re-readingprd.tomlshows every storypasses = true. The agent flipping that flag is the one authoritative signal. -
No feature branches. Ralph creates/checks out a
branchNamefrom the PRD on every iteration. In practice branches get in the way of stacking, in-flight human edits, and any kind of human-in-the-loop review — every iteration becomes a merge negotiation. Riveter works in-place on@and never touches branches or bookmarks. You stay on whatever you were on, and Riveter's commits land on top of your work like any other commit. -
jjinstead ofgit.jj's inverted workflow (create a revision before editing, not after) maps cleanly onto an autonomous loop: each iteration is onejj describe+jj new, with no staging step to forget and no "did the agent commit?" ambiguity. Review becomesjj log -r 'description(glob:"[[]RIVETER*")'; rejection becomesriveter reject(which amends, not abandons).jj's lossless operation log also means a broken iteration is never destructive — you can alwaysjj op undo. -
prd.tomlinstead ofprd.json. TOML allows comments, trailing commas don't matter, and the format reads cleanly when humans inspect or edit it mid-run. It's still trivial for the agent to parse and round-trip viatoml_editwithout reformatting the human's hand-written sections. -
Array order, not a
priorityfield. Ralph stories carry an integerpriorityand the agent picks "highest priority pending". In practice the priority field is just a second representation of "what order should we do these in" — confusing for both humans (which number means more important, 1 or 10?) and agents (do priority ties break by id?). Riveter stories are executed strictly in array order. To reorder, you reorder the array. Theidfield is purely a stable reference for commit messages andriveter reject. -
Quality-check / clean-fix / browser-test rules live in the prompt template, not external
AGENTS.md/CLAUDE.mdfiles. Ralph asks the agent to read+updateAGENTS.md/CLAUDE.mdfiles in nearby directories. Riveter inlines those rules directly into the rendered prompt every iteration. Trade-off: the rules aren't automatically loaded when the same agent works in your repo outside a Riveter run, and they aren't shared across runs. Win: the rules are guaranteed to be in the context window for every Riveter iteration, you don't have to maintain parallelAGENTS.mdcopies, and you can tweak them once in the Riveter template instead of per-project. Project-specific conventions still flow throughlearnings.md(per-run) and your existingAGENTS.mdfiles (which the agent will read when it touches nearby code), so you don't lose anything important.
riveter install-skillStages the bundled skill at ~/.riveter/skill-staging/ and hands off to the
interactive npx skills add UI — pick
your agents, scope (project/global), and install mode there. Requires npx
on PATH.
Inside any agent session (Codex, Claude, Amp, ...) that has access to the
riveter-create-run skill installed above, feed it your spec:
Riveter create run: add high/medium/low priority to tasks (default medium); show colored badges; filter by priority.
The skill writes <state-dir>/runs/<runId>/{prd.toml,spec.md} and prints the
runId, e.g. task-priority-a1b2c3d4.
$EDITOR ~/.riveter/runs/task-priority-a1b2c3d4/prd.tomlcd ~/projects/my-app # a jj-managed repo
riveter run -r task-priority-a1b2c3d4 \
-a codex -m gpt-5.5 -t highriveter never switches branches/bookmarks. It commits each iteration on @
and then opens a fresh empty commit on top. Resume by re-running the same
command — passing stories are skipped.
By default Riveter does not stream the agent's chatter to your terminal.
You'll see iteration banners, [agent], [jj], [prd], and [reject]
progress lines from Riveter itself, plus a tail -f hint per iteration so
you can watch the agent live if you want:
============================================================
iteration 1/10 (#1) · #1 "Add priority field to tasks table"
============================================================
[agent] codex (model=gpt-5.5, thinking=high)
[agent] live output suppressed. To watch this step:
tail -f ~/.riveter/runs/task-priority-a1b2c3d4/iterations/001/stdout.log
tail -f ~/.riveter/runs/task-priority-a1b2c3d4/iterations/001/stderr.log
Pass --show-agent if you want the live │ ...-prefixed tee in your
terminal anyway. The full transcript is always written to disk regardless.
jj log -r 'description(glob:"[[]RIVETER*")' # all Riveter commits
jj diff -r <change_id> # one iteration
jj abandon <change_id> # reject a bad iterationCommit subjects encode the run id, story id, and model:
[RIVETER(task-priority-a1b2c3d4,#2,gpt-5.5)] chore: Display priority badge on task cards
Instead of jj abandon-ing the bad commit (which throws away the agent's
attempt entirely), Riveter has a first-class reject workflow that keeps
the rejected commit in history and surfaces it to the next iteration as
prompt context so the agent can learn from what was tried:
riveter reject -r <runId> [-c <change_id>] [-m "reason for rejection"]What it does:
- Resolves the target jj change. If
-cis omitted, it picks the most recent[RIVETER(<runId>,...)]commit reachable from@. - Amends that commit's description from
[RIVETER(<runId>,#<id>,<model>)] chore: <title>to[RIVETER-REJECTED(<runId>,#<id>,<model>)] chore: <title> | rejected: <reason>, with a reviewer note appended to the body. The commit and its diff stay in history. - Flips that story's
passesflag back tofalseinprd.toml.
On the next riveter run -r <runId>, the loop scans jj log for any
[RIVETER-REJECTED(<runId>,#<storyId>,*)] commits reachable from @ and
includes each one's description + diff in the prompt under a "Previously
rejected attempts on this story" section. The agent is instructed to read
them and avoid repeating the same mistakes.
If you want to throw an attempt away entirely instead of keeping it as
learning material, jj abandon <change_id> + manually editing prd.toml
still works — but riveter reject is the recommended path.
| Flag | Cheap & cheerful | Default (codex) | Default (claude) |
|---|---|---|---|
--thinking |
low |
high |
xhigh |
--model |
mini tier | gpt-5.5 |
claude-opus-4-7 |
--model and --thinking defaults are now per-agent: codex →
gpt-5.5 + high, claude → claude-opus-4-7 + xhigh. Pass -m /
-t to override either.
riveter run -r <runId> --fast--fast is not a different model — it's a service-tier knob. For
codex it adds -c service_tier="fast" to the underlying invocation,
which per OpenAI's docs makes the same model generate tokens about 1.5×
faster for about 2.5× the cost. It does not change --model or
--thinking; combine it with any model/effort you want. For claude
there's no equivalent today and --fast is a no-op (logged with a
one-line warning).
Defaults are deliberately on the expensive end — Ralph relies on each iteration being good enough that the loop doesn't waste turns. Dial down for cheaper exploration; dial up when stories keep failing acceptance.
The built-in mock agent has no LLM dependency. It walks the PRD
deterministically (touches one file per story, flips passes = true), making
it perfect for integration tests and quick sanity checks:
riveter run -r <runId> --agent mock~/.riveter/runs/<runId>/
├── spec.md # verbatim user input
├── prd.toml # source of truth for progress (incl. per-story `notes`)
├── learnings.md # append-only learnings log; agent reads + writes
└── iterations/
├── 001/
│ ├── prompt.txt # exact bytes sent to the agent
│ ├── stdout.log # tee'd output
│ ├── stderr.log
│ └── exit.txt
├── 002/
└── ...
learnings.md is created on the first iteration. Its ## Codebase Patterns
section is the agent's authoritative summary of project conventions it has
learned across iterations — the prompt template instructs every fresh agent
to read it first and append a new learnings block at the end.
Each [[stories]] entry in prd.toml also has a notes field (free-form
string, defaults to empty) that the agent is allowed to overwrite with
per-story scratch context — failing commands, design dilemmas, references —
so a future iteration on the same story can pick up where the previous one
left off.
Runs are never auto-deleted. Iteration dirs are append-only across
re-runs (numbering continues 003, 004, …). To clean up:
ls -lt ~/.riveter/runs/ # oldest at the bottom
rm -rf ~/.riveter/runs/<runId> # one
rm -rf ~/.riveter/runs/* # allOverride the location with RIVETER_STATE_DIR=/path/to/dir.
cargo build --release
./target/release/riveter --helpTests:
cargo testThe default integration test uses --agent mock so no LLM credentials are
required.
There is a separate, gated E2E test that exercises the riveter-create-run
skill + the loop with real codex against a temp jj repo. It's
#[ignore] by default because it costs money and takes ~90 s.
RIVETER_E2E=1 cargo test --test e2e_codex -- --ignored --nocaptureKnobs (env vars, all optional):
| Var | Default | What |
|---|---|---|
RIVETER_E2E |
unset | master switch; test SKIPs if unset |
RIVETER_E2E_MODEL |
gpt-5.5 |
model passed to codex (must work in your codex auth) |
RIVETER_E2E_THINKING |
low |
reasoning effort |
RIVETER_E2E_MAX_ITER |
3 |
hard cap on loop iterations |
RIVETER_E2E_TIMEOUT_S |
600 |
total wall-clock budget |
| Exit | Meaning |
|---|---|
| 0 | All stories passes = true |
| 10 | Agent exited non-zero |
| 12 | Agent exited 0 but jj working copy is clean |
| 13 | A jj subcommand failed |
| 14 | prd.toml missing/invalid after iteration |
| 20 | Hit --max-iterations with stories still pending |
| 30 | riveter validate: schema/layout errors |
| 31 | riveter validate: run folder missing |
| 32 | riveter validate: I/O error |
| 40 | riveter reject: no matching [RIVETER(...)] commit |
| 41 | riveter reject: target commit has unparseable subject |
| 130 | SIGINT/SIGTERM |