Skip to content

rivet-dev/riveter

Repository files navigation

Ralph the Riveter

A focused reimplementation of the Ralph autonomous coding loop in Rust. Slimmer than the original bash version, with pluggable agents (codex, claude, or mock), per-iteration jj commits, and in-place execution (no branch switching).

The binary is named riveter (the tool); the methodology it implements is Ralph (one focused story per fresh agent context, persistent state on disk, jj as the review surface).


Why Ralph the Riveter

  • Fresh context per story. Each iteration spawns a new agent process with no memory of prior iterations. The agent's working set is exactly one user story + the codebase + the progress log — never the full transcript of past attempts. This keeps the context window small and focused, which is the single biggest lever on agent quality and cost.
  • Persistent state lives on disk, not in-context. prd.toml tracks which stories are done — the agent reads it (plus its assigned story) at the start of every run and flips passes = true when the story's acceptance criteria are met.
  • jj is the review surface. Because every iteration is one jj commit on @, you review Riveter's work the same way you review your own: jj log, jj diff -r <id>, jj abandon <id> to reject, jj squash/jj split to reshape. No branch dance, no PR ceremony, no merge conflicts with your in-flight work.
  • Pluggable agents. Codex and Claude are interchangeable — pick the model that's good at the kind of work the spec needs. Same loop, same artifacts, same review flow.

Intentional deviations from Ralph

Riveter follows the Ralph methodology faithfully, but a handful of design choices were deliberately changed from the original Ralph bash implementation:

  • No progress.txt style early-termination signal. Ralph greps the last 20 lines of agent output for <promise>COMPLETE</promise> and exits the loop on match. That's brittle: agents fail early, get rate-limited, or print the sentinel as part of their own instruction echo, and the loop exits prematurely. Riveter terminates only when re-reading prd.toml shows every story passes = true. The agent flipping that flag is the one authoritative signal.

  • No feature branches. Ralph creates/checks out a branchName from the PRD on every iteration. In practice branches get in the way of stacking, in-flight human edits, and any kind of human-in-the-loop review — every iteration becomes a merge negotiation. Riveter works in-place on @ and never touches branches or bookmarks. You stay on whatever you were on, and Riveter's commits land on top of your work like any other commit.

  • jj instead of git. jj's inverted workflow (create a revision before editing, not after) maps cleanly onto an autonomous loop: each iteration is one jj describe + jj new, with no staging step to forget and no "did the agent commit?" ambiguity. Review becomes jj log -r 'description(glob:"[[]RIVETER*")'; rejection becomes riveter reject (which amends, not abandons). jj's lossless operation log also means a broken iteration is never destructive — you can always jj op undo.

  • prd.toml instead of prd.json. TOML allows comments, trailing commas don't matter, and the format reads cleanly when humans inspect or edit it mid-run. It's still trivial for the agent to parse and round-trip via toml_edit without reformatting the human's hand-written sections.

  • Array order, not a priority field. Ralph stories carry an integer priority and the agent picks "highest priority pending". In practice the priority field is just a second representation of "what order should we do these in" — confusing for both humans (which number means more important, 1 or 10?) and agents (do priority ties break by id?). Riveter stories are executed strictly in array order. To reorder, you reorder the array. The id field is purely a stable reference for commit messages and riveter reject.

  • Quality-check / clean-fix / browser-test rules live in the prompt template, not external AGENTS.md / CLAUDE.md files. Ralph asks the agent to read+update AGENTS.md/CLAUDE.md files in nearby directories. Riveter inlines those rules directly into the rendered prompt every iteration. Trade-off: the rules aren't automatically loaded when the same agent works in your repo outside a Riveter run, and they aren't shared across runs. Win: the rules are guaranteed to be in the context window for every Riveter iteration, you don't have to maintain parallel AGENTS.md copies, and you can tweak them once in the Riveter template instead of per-project. Project-specific conventions still flow through learnings.md (per-run) and your existing AGENTS.md files (which the agent will read when it touches nearby code), so you don't lose anything important.


Quick start

0. Install the bundled riveter-create-run skill (one-time)

riveter install-skill

Stages the bundled skill at ~/.riveter/skill-staging/ and hands off to the interactive npx skills add UI — pick your agents, scope (project/global), and install mode there. Requires npx on PATH.

1. Generate a run folder

Inside any agent session (Codex, Claude, Amp, ...) that has access to the riveter-create-run skill installed above, feed it your spec:

Riveter create run: add high/medium/low priority to tasks (default medium); show colored badges; filter by priority.

The skill writes <state-dir>/runs/<runId>/{prd.toml,spec.md} and prints the runId, e.g. task-priority-a1b2c3d4.

2. (Optional) inspect / edit the PRD

$EDITOR ~/.riveter/runs/task-priority-a1b2c3d4/prd.toml

3. Run the loop in your jj repo

cd ~/projects/my-app                       # a jj-managed repo
riveter run -r task-priority-a1b2c3d4 \
            -a codex -m gpt-5.5 -t high

riveter never switches branches/bookmarks. It commits each iteration on @ and then opens a fresh empty commit on top. Resume by re-running the same command — passing stories are skipped.

By default Riveter does not stream the agent's chatter to your terminal. You'll see iteration banners, [agent], [jj], [prd], and [reject] progress lines from Riveter itself, plus a tail -f hint per iteration so you can watch the agent live if you want:

============================================================
  iteration 1/10 (#1) · #1 "Add priority field to tasks table"
============================================================
[agent] codex (model=gpt-5.5, thinking=high)
[agent] live output suppressed. To watch this step:
        tail -f ~/.riveter/runs/task-priority-a1b2c3d4/iterations/001/stdout.log
        tail -f ~/.riveter/runs/task-priority-a1b2c3d4/iterations/001/stderr.log

Pass --show-agent if you want the live │ ...-prefixed tee in your terminal anyway. The full transcript is always written to disk regardless.


Reviewing a run

jj log -r 'description(glob:"[[]RIVETER*")'   # all Riveter commits
jj diff -r <change_id>                        # one iteration
jj abandon <change_id>                        # reject a bad iteration

Commit subjects encode the run id, story id, and model:

[RIVETER(task-priority-a1b2c3d4,#2,gpt-5.5)] chore: Display priority badge on task cards

Rejecting bad iterations

Instead of jj abandon-ing the bad commit (which throws away the agent's attempt entirely), Riveter has a first-class reject workflow that keeps the rejected commit in history and surfaces it to the next iteration as prompt context so the agent can learn from what was tried:

riveter reject -r <runId> [-c <change_id>] [-m "reason for rejection"]

What it does:

  1. Resolves the target jj change. If -c is omitted, it picks the most recent [RIVETER(<runId>,...)] commit reachable from @.
  2. Amends that commit's description from [RIVETER(<runId>,#<id>,<model>)] chore: <title> to [RIVETER-REJECTED(<runId>,#<id>,<model>)] chore: <title> | rejected: <reason>, with a reviewer note appended to the body. The commit and its diff stay in history.
  3. Flips that story's passes flag back to false in prd.toml.

On the next riveter run -r <runId>, the loop scans jj log for any [RIVETER-REJECTED(<runId>,#<storyId>,*)] commits reachable from @ and includes each one's description + diff in the prompt under a "Previously rejected attempts on this story" section. The agent is instructed to read them and avoid repeating the same mistakes.

If you want to throw an attempt away entirely instead of keeping it as learning material, jj abandon <change_id> + manually editing prd.toml still works — but riveter reject is the recommended path.


Tuning cost vs. quality

Flag Cheap & cheerful Default (codex) Default (claude)
--thinking low high xhigh
--model mini tier gpt-5.5 claude-opus-4-7

--model and --thinking defaults are now per-agent: codex → gpt-5.5 + high, claude → claude-opus-4-7 + xhigh. Pass -m / -t to override either.

--fast (fast service tier)

riveter run -r <runId> --fast

--fast is not a different model — it's a service-tier knob. For codex it adds -c service_tier="fast" to the underlying invocation, which per OpenAI's docs makes the same model generate tokens about 1.5× faster for about 2.5× the cost. It does not change --model or --thinking; combine it with any model/effort you want. For claude there's no equivalent today and --fast is a no-op (logged with a one-line warning).

Defaults are deliberately on the expensive end — Ralph relies on each iteration being good enough that the loop doesn't waste turns. Dial down for cheaper exploration; dial up when stories keep failing acceptance.


--agent mock (for hacking on Riveter)

The built-in mock agent has no LLM dependency. It walks the PRD deterministically (touches one file per story, flips passes = true), making it perfect for integration tests and quick sanity checks:

riveter run -r <runId> --agent mock

Layout

~/.riveter/runs/<runId>/
  ├── spec.md               # verbatim user input
  ├── prd.toml              # source of truth for progress (incl. per-story `notes`)
  ├── learnings.md          # append-only learnings log; agent reads + writes
  └── iterations/
      ├── 001/
      │   ├── prompt.txt    # exact bytes sent to the agent
      │   ├── stdout.log    # tee'd output
      │   ├── stderr.log
      │   └── exit.txt
      ├── 002/
      └── ...

learnings.md is created on the first iteration. Its ## Codebase Patterns section is the agent's authoritative summary of project conventions it has learned across iterations — the prompt template instructs every fresh agent to read it first and append a new learnings block at the end.

Each [[stories]] entry in prd.toml also has a notes field (free-form string, defaults to empty) that the agent is allowed to overwrite with per-story scratch context — failing commands, design dilemmas, references — so a future iteration on the same story can pick up where the previous one left off.

Runs are never auto-deleted. Iteration dirs are append-only across re-runs (numbering continues 003, 004, …). To clean up:

ls -lt ~/.riveter/runs/                  # oldest at the bottom
rm -rf ~/.riveter/runs/<runId>           # one
rm -rf ~/.riveter/runs/*                 # all

Override the location with RIVETER_STATE_DIR=/path/to/dir.


Building

cargo build --release
./target/release/riveter --help

Tests:

cargo test

The default integration test uses --agent mock so no LLM credentials are required.

End-to-end test (real codex)

There is a separate, gated E2E test that exercises the riveter-create-run skill + the loop with real codex against a temp jj repo. It's #[ignore] by default because it costs money and takes ~90 s.

RIVETER_E2E=1 cargo test --test e2e_codex -- --ignored --nocapture

Knobs (env vars, all optional):

Var Default What
RIVETER_E2E unset master switch; test SKIPs if unset
RIVETER_E2E_MODEL gpt-5.5 model passed to codex (must work in your codex auth)
RIVETER_E2E_THINKING low reasoning effort
RIVETER_E2E_MAX_ITER 3 hard cap on loop iterations
RIVETER_E2E_TIMEOUT_S 600 total wall-clock budget

Exit codes

Exit Meaning
0 All stories passes = true
10 Agent exited non-zero
12 Agent exited 0 but jj working copy is clean
13 A jj subcommand failed
14 prd.toml missing/invalid after iteration
20 Hit --max-iterations with stories still pending
30 riveter validate: schema/layout errors
31 riveter validate: run folder missing
32 riveter validate: I/O error
40 riveter reject: no matching [RIVETER(...)] commit
41 riveter reject: target commit has unparseable subject
130 SIGINT/SIGTERM

About

Ralph the Riveter — an autonomous coding loop using jj and pluggable agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages