feat(loop): <promise> completion tag for early exit by paulnsorensen · Pull Request #46 · computerlovetech/ralphify

paulnsorensen · 2026-04-17T21:22:41Z

Summary

Adds opt-in early-exit to the loop via a structured completion tag. When
stop_on_completion_signal: true is set in frontmatter and the agent
emits a matching <promise>TEXT</promise> tag in its output, ralphify
stops after the current iteration instead of continuing to the iteration
budget. The default inner text is RALPH_PROMISE_COMPLETE; override it
with the optional completion_signal field.

Why the `<promise>` tag format

The tag format intentionally mirrors what the Claude Code ralphify
plugin (Ralph-Wiggum) already uses to signal structured completion
back to its harness. Aligning on the same <promise>...</promise> shape
means a single prompt body works on both sides — a ralph written against
the Claude plugin can stop a ralphify loop with no adapter code, and
vice versa. Ralphify still runs its own command/prompt loop; only the
completion-tag format is shared.

What's new

src/ralphify/_promise.py — strict <promise>TEXT</promise> parser
with whitespace normalization (bare marker text no longer counts)
Two optional frontmatter fields:
- completion_signal — inner text of the promise tag (default
  RALPH_PROMISE_COMPLETE)
- stop_on_completion_signal — bool gate for early exit (default
  false)
examples/promise-completion/RALPH.md — canonical example, surfaced
from ralph scaffold help and the cookbook
ralph run --help and ralph scaffold --help now explain the pattern
Docs updated across getting-started, cli, how-it-works, agents,
cookbook, quick-reference
Regression coverage in tests/test_promise.py,
tests/test_agent.py, tests/test_cli.py, tests/test_engine.py
(changed modules land at ~98% coverage)

Backwards compatibility

Default behavior is unchanged. Loops still run to their iteration budget
unless both stop_on_completion_signal: true and the configured tag
are present. Existing ralphs are untouched.

Test plan

uv run pytest — 661 passed, 1 xpassed
uv run ruff check .
uv run ruff format --check .
uv run ty check
uv run mkdocs build --strict
Example ralph in examples/promise-completion/ exits early on
<promise>COMPLETE</promise>

Note on help-text test robustness

tests/test_cli.py::TestHelp asserts that <promise>...</promise>
appears in --help output. Rich wraps help text differently in headless
CI vs. a local TTY, which splits the tag across lines and breaks a
naïve substring check. The tests strip ANSI codes and collapse
whitespace before asserting, so they're stable on both sides.

Adds opt-in early-exit to the loop via a structured completion tag. When stop_on_completion_signal: true is set in frontmatter and the agent emits a matching <TEXT> tag in its output, ralphify stops after the current iteration instead of continuing to the iteration budget. - _promise.py: strict TEXT parser with whitespace normalization - Two optional frontmatter fields: completion_signal and stop_on_completion_signal - examples/promise-completion/RALPH.md: canonical example - Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference - Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage)

#6) * feat: seed CLI adapter layer with Claude, Codex, Copilot adapters Introduces a pluggable CLIAdapter Protocol and lands the three concrete adapters that exercise it, so the abstraction ships with demonstrated value on day one. - src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent, first-match ADAPTERS registry, select_adapter() dispatch with GenericAdapter fallback. - src/ralphify/adapters/_generic.py — blocking-path fallback adapter. - src/ralphify/adapters/claude.py — stem "claude", --output-format stream-json --verbose, parses tool_use blocks in assistant messages, extracts completion signal from terminal result event. - src/ralphify/adapters/codex.py — stem "codex", --json, maps CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans TurnCompleted/TaskComplete for promise tags. - src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"), --output-format json, renders_structured=False, supports_soft_windown =False. install_wind_down_hook is part of the Protocol surface but every concrete adapter currently raises NotImplementedError — the actual hook plumbing (soft wind-down, AgentHook fanout, max_turns enforcement) lands in subsequent PRs. This PR is pure additions with no engine wiring: adapters are registered and discoverable, but the run loop still uses the old hard-coded Claude path. Tests: 50 new tests covering each adapter's event parsing, command building, completion-signal extraction, and registry dispatch. Co-Authored-By: WOZCODE <contact@withwoz.com> * feat: route Claude detection and promise completion through adapter Wire the CLI adapter layer into the three sites that previously hard-coded Claude knowledge: - _agent.py: execute_agent resolves the adapter (or accepts one from the caller) and delegates flag construction to adapter.build_command. _run_agent_streaming no longer appends --output-format/--verbose itself — the cmd it receives is already flagged. - _console_emitter.py: startup-hint text picks "live activity on" vs "live output on" via adapter.renders_structured instead of a claude-binary check. - engine.py: promise completion runs through adapter.extract_completion_signal(captured_stdout, signal). ClaudeAdapter scans the terminal result event only; GenericAdapter scans full stdout to preserve behavior for unknown CLIs. The engine-side _record_promise_fragment incremental scanner is gone. Tightens ClaudeAdapter.extract_completion_signal to return False when no result event is present so embedded <promise> tags inside status or assistant JSON can't trigger early completion. * fix: address copilot review feedback on adapter seed PR - Rename `supports_soft_windown` → `supports_soft_wind_down` across the Protocol and all adapters to match the `install_wind_down_hook` / "wind-down" terminology used elsewhere (caught before it baked into the public adapter surface). - Claude and Copilot `build_command()` now detect the `--output-format` flag pair and overwrite a user-supplied value instead of treating the flag and value tokens independently (which produced invalid argv when a caller pre-set `--output-format <other>`). - Align Claude `parse_event()` docstring with actual behavior: `result` events return `kind="result"` and non-assistant events return `kind="message"` rather than `None`. - Flip Codex `renders_structured` to False so the peek panel stays in raw-line mode until the emitter is adapter-driven; Codex's streaming path currently feeds an `_IterationPanel` that only understands Claude's stream-json schema. * refactor: split renders_structured into supports_streaming and renders_structured_peek Addresses the architectural concern in PR #6 review: the old `renders_structured` flag conflated two orthogonal capabilities: 1. Whether the adapter emits a structured JSON stream that the execution path can parse into on_activity callbacks. 2. Whether the console peek panel understands the adapter's event schema and should render it via `_IterationPanel` instead of raw lines. Before the split, an adapter had to choose between a useful streaming path with an empty peek panel (Codex's previous state) or a working raw peek with no activity callbacks. The split makes the correct combination representable: - ClaudeAdapter: supports_streaming=True, renders_structured_peek=True - CodexAdapter: supports_streaming=True, renders_structured_peek=False - CopilotAdapter / GenericAdapter: both False Codex now takes the streaming path again (on_activity callbacks fire) while the peek panel stays in raw-line mode until the emitter is adapter-driven. Callers updated: - _agent.execute_agent dispatches on adapter.supports_streaming - _console_emitter consumes adapter.renders_structured_peek via _agent_renders_structured_peek (renamed) - tests/conftest autouse fixture forces both flags to False * perf: skip stdout capture when adapter exposes result_text The previous adapter seed forced capture_stdout=True on every iteration so the engine could feed the buffered transcript into adapter.extract_completion_signal. For verbose streaming agents this regressed memory vs the prior tail-scan approach, and Claude in particular was paying for a full transcript buffer it did not need: agent.result_text already carries the terminal assistant message. Add a requires_full_stdout_for_completion capability flag and a keyword-only extract_completion_signal(result_text=, stdout=, user_signal=) signature so each adapter declares its own input. Claude sets the flag False and reads result_text directly; Codex / Copilot / Generic keep the full-stdout path. The engine now gates capture on log_dir or (stop_on_completion_signal AND requires_full_stdout_for_completion), so streaming agents skip the buffer entirely unless logging is on, and non-streaming agents only buffer when the user opts into completion signalling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: WOZCODE <contact@withwoz.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

) * refactor: extract adapter Protocol and registry into _protocol.py The adapter package's __init__ both imports concrete adapter modules (to populate ADAPTERS) and defines the Protocol those modules depend on. That's a latent circular-import hazard: it works today only because __init__ defines the Protocol before it imports concrete adapters. As soon as an adapter grows a dependency that could re-enter the package namespace, the order-of-execution becomes load-bearing. Split the Protocol, the AdapterEvent NamedTuple, the Literal kind enums, and the ADAPTERS list into a new leaf module _protocol.py. Concrete adapters (claude, codex, copilot, _generic) now import from _protocol directly. __init__.py re-exports the public names so existing `from ralphify.adapters import CLIAdapter` imports keep working, and keeps select_adapter plus the builtin-registration bootstrap. No behaviour change. * feat: add turn-cap event types and max_turns frontmatter fields Preparatory foundation for the forthcoming turn-cap enforcement work: no enforcement yet, just the event-type surface and the validated RunConfig fields that enforcement will read. - _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and ITERATION_TURN_CAPPED event kinds with typed data payloads (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded through the EventData union. - _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE constants. - _run_types.py: max_turns (int | None, default None) and max_turns_grace (int, default 2) fields on RunConfig. - cli.py: _validate_max_turns / _validate_max_turns_grace helpers with range and type checks; grace must be strictly less than the cap when the cap is set. _build_run_config threads both values into the returned RunConfig. - tests/test_cli_frontmatter_fields.py: unit coverage for the validators plus end-to-end _build_run_config assertions for the default, valid, and invalid cases. Nothing consumes max_turns yet; it will be read by the adapter-driven enforcement path in a follow-up PR. * fix: mock shutil.which in frontmatter field tests CI runners don't have 'claude' on PATH, so _build_run_config's agent validation was failing before reaching the turn-cap assertions.

paulnsorensen force-pushed the main branch from 6523677 to 85ff945 Compare April 18, 2026 21:22

paulnsorensen and others added 2 commits April 18, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loop): <promise> completion tag for early exit#46

feat(loop): <promise> completion tag for early exit#46
paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
paulnsorensen:main

paulnsorensen commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paulnsorensen commented Apr 17, 2026

Summary

Why the <promise> tag format

What's new

Backwards compatibility

Test plan

Note on help-text test robustness

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why the `<promise>` tag format