feat(loop): <promise> completion tag for early exit#46
Open
paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
Open
feat(loop): <promise> completion tag for early exit#46paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
Conversation
Adds opt-in early-exit to the loop via a structured completion tag. When stop_on_completion_signal: true is set in frontmatter and the agent emits a matching <TEXT> tag in its output, ralphify stops after the current iteration instead of continuing to the iteration budget. - _promise.py: strict TEXT parser with whitespace normalization - Two optional frontmatter fields: completion_signal and stop_on_completion_signal - examples/promise-completion/RALPH.md: canonical example - Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference - Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage)
#6) * feat: seed CLI adapter layer with Claude, Codex, Copilot adapters Introduces a pluggable CLIAdapter Protocol and lands the three concrete adapters that exercise it, so the abstraction ships with demonstrated value on day one. - src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent, first-match ADAPTERS registry, select_adapter() dispatch with GenericAdapter fallback. - src/ralphify/adapters/_generic.py — blocking-path fallback adapter. - src/ralphify/adapters/claude.py — stem "claude", --output-format stream-json --verbose, parses tool_use blocks in assistant messages, extracts completion signal from terminal result event. - src/ralphify/adapters/codex.py — stem "codex", --json, maps CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans TurnCompleted/TaskComplete for promise tags. - src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"), --output-format json, renders_structured=False, supports_soft_windown =False. install_wind_down_hook is part of the Protocol surface but every concrete adapter currently raises NotImplementedError — the actual hook plumbing (soft wind-down, AgentHook fanout, max_turns enforcement) lands in subsequent PRs. This PR is pure additions with no engine wiring: adapters are registered and discoverable, but the run loop still uses the old hard-coded Claude path. Tests: 50 new tests covering each adapter's event parsing, command building, completion-signal extraction, and registry dispatch. Co-Authored-By: WOZCODE <contact@withwoz.com> * feat: route Claude detection and promise completion through adapter Wire the CLI adapter layer into the three sites that previously hard-coded Claude knowledge: - _agent.py: execute_agent resolves the adapter (or accepts one from the caller) and delegates flag construction to adapter.build_command. _run_agent_streaming no longer appends --output-format/--verbose itself — the cmd it receives is already flagged. - _console_emitter.py: startup-hint text picks "live activity on" vs "live output on" via adapter.renders_structured instead of a claude-binary check. - engine.py: promise completion runs through adapter.extract_completion_signal(captured_stdout, signal). ClaudeAdapter scans the terminal result event only; GenericAdapter scans full stdout to preserve behavior for unknown CLIs. The engine-side _record_promise_fragment incremental scanner is gone. Tightens ClaudeAdapter.extract_completion_signal to return False when no result event is present so embedded <promise> tags inside status or assistant JSON can't trigger early completion. * fix: address copilot review feedback on adapter seed PR - Rename `supports_soft_windown` → `supports_soft_wind_down` across the Protocol and all adapters to match the `install_wind_down_hook` / "wind-down" terminology used elsewhere (caught before it baked into the public adapter surface). - Claude and Copilot `build_command()` now detect the `--output-format` flag pair and overwrite a user-supplied value instead of treating the flag and value tokens independently (which produced invalid argv when a caller pre-set `--output-format <other>`). - Align Claude `parse_event()` docstring with actual behavior: `result` events return `kind="result"` and non-assistant events return `kind="message"` rather than `None`. - Flip Codex `renders_structured` to False so the peek panel stays in raw-line mode until the emitter is adapter-driven; Codex's streaming path currently feeds an `_IterationPanel` that only understands Claude's stream-json schema. * refactor: split renders_structured into supports_streaming and renders_structured_peek Addresses the architectural concern in PR #6 review: the old `renders_structured` flag conflated two orthogonal capabilities: 1. Whether the adapter emits a structured JSON stream that the execution path can parse into on_activity callbacks. 2. Whether the console peek panel understands the adapter's event schema and should render it via `_IterationPanel` instead of raw lines. Before the split, an adapter had to choose between a useful streaming path with an empty peek panel (Codex's previous state) or a working raw peek with no activity callbacks. The split makes the correct combination representable: - ClaudeAdapter: supports_streaming=True, renders_structured_peek=True - CodexAdapter: supports_streaming=True, renders_structured_peek=False - CopilotAdapter / GenericAdapter: both False Codex now takes the streaming path again (on_activity callbacks fire) while the peek panel stays in raw-line mode until the emitter is adapter-driven. Callers updated: - _agent.execute_agent dispatches on adapter.supports_streaming - _console_emitter consumes adapter.renders_structured_peek via _agent_renders_structured_peek (renamed) - tests/conftest autouse fixture forces both flags to False * perf: skip stdout capture when adapter exposes result_text The previous adapter seed forced capture_stdout=True on every iteration so the engine could feed the buffered transcript into adapter.extract_completion_signal. For verbose streaming agents this regressed memory vs the prior tail-scan approach, and Claude in particular was paying for a full transcript buffer it did not need: agent.result_text already carries the terminal assistant message. Add a requires_full_stdout_for_completion capability flag and a keyword-only extract_completion_signal(result_text=, stdout=, user_signal=) signature so each adapter declares its own input. Claude sets the flag False and reads result_text directly; Codex / Copilot / Generic keep the full-stdout path. The engine now gates capture on log_dir or (stop_on_completion_signal AND requires_full_stdout_for_completion), so streaming agents skip the buffer entirely unless logging is on, and non-streaming agents only buffer when the user opts into completion signalling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: WOZCODE <contact@withwoz.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
) * refactor: extract adapter Protocol and registry into _protocol.py The adapter package's __init__ both imports concrete adapter modules (to populate ADAPTERS) and defines the Protocol those modules depend on. That's a latent circular-import hazard: it works today only because __init__ defines the Protocol before it imports concrete adapters. As soon as an adapter grows a dependency that could re-enter the package namespace, the order-of-execution becomes load-bearing. Split the Protocol, the AdapterEvent NamedTuple, the Literal kind enums, and the ADAPTERS list into a new leaf module _protocol.py. Concrete adapters (claude, codex, copilot, _generic) now import from _protocol directly. __init__.py re-exports the public names so existing `from ralphify.adapters import CLIAdapter` imports keep working, and keeps select_adapter plus the builtin-registration bootstrap. No behaviour change. * feat: add turn-cap event types and max_turns frontmatter fields Preparatory foundation for the forthcoming turn-cap enforcement work: no enforcement yet, just the event-type surface and the validated RunConfig fields that enforcement will read. - _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and ITERATION_TURN_CAPPED event kinds with typed data payloads (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded through the EventData union. - _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE constants. - _run_types.py: max_turns (int | None, default None) and max_turns_grace (int, default 2) fields on RunConfig. - cli.py: _validate_max_turns / _validate_max_turns_grace helpers with range and type checks; grace must be strictly less than the cap when the cap is set. _build_run_config threads both values into the returned RunConfig. - tests/test_cli_frontmatter_fields.py: unit coverage for the validators plus end-to-end _build_run_config assertions for the default, valid, and invalid cases. Nothing consumes max_turns yet; it will be read by the adapter-driven enforcement path in a follow-up PR. * fix: mock shutil.which in frontmatter field tests CI runners don't have 'claude' on PATH, so _build_run_config's agent validation was failing before reaching the turn-cap assertions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in early-exit to the loop via a structured completion tag. When
stop_on_completion_signal: trueis set in frontmatter and the agentemits a matching
<promise>TEXT</promise>tag in its output, ralphifystops after the current iteration instead of continuing to the iteration
budget. The default inner text is
RALPH_PROMISE_COMPLETE; override itwith the optional
completion_signalfield.Why the
<promise>tag formatThe tag format intentionally mirrors what the Claude Code ralphify
plugin (Ralph-Wiggum) already uses to signal structured completion
back to its harness. Aligning on the same
<promise>...</promise>shapemeans a single prompt body works on both sides — a ralph written against
the Claude plugin can stop a ralphify loop with no adapter code, and
vice versa. Ralphify still runs its own command/prompt loop; only the
completion-tag format is shared.
What's new
src/ralphify/_promise.py— strict<promise>TEXT</promise>parserwith whitespace normalization (bare marker text no longer counts)
completion_signal— inner text of the promise tag (defaultRALPH_PROMISE_COMPLETE)stop_on_completion_signal— bool gate for early exit (defaultfalse)examples/promise-completion/RALPH.md— canonical example, surfacedfrom
ralph scaffoldhelp and the cookbookralph run --helpandralph scaffold --helpnow explain the patterncookbook, quick-reference
tests/test_promise.py,tests/test_agent.py,tests/test_cli.py,tests/test_engine.py(changed modules land at ~98% coverage)
Backwards compatibility
Default behavior is unchanged. Loops still run to their iteration budget
unless both
stop_on_completion_signal: trueand the configured tagare present. Existing ralphs are untouched.
Test plan
uv run pytest— 661 passed, 1 xpasseduv run ruff check .uv run ruff format --check .uv run ty checkuv run mkdocs build --strictexamples/promise-completion/exits early on<promise>COMPLETE</promise>Note on help-text test robustness
tests/test_cli.py::TestHelpasserts that<promise>...</promise>appears in
--helpoutput. Rich wraps help text differently in headlessCI vs. a local TTY, which splits the tag across lines and breaks a
naïve substring check. The tests strip ANSI codes and collapse
whitespace before asserting, so they're stable on both sides.