Skip to content

feat(loop): <promise> completion tag for early exit#46

Open
paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
paulnsorensen:main
Open

feat(loop): <promise> completion tag for early exit#46
paulnsorensen wants to merge 3 commits intocomputerlovetech:mainfrom
paulnsorensen:main

Conversation

@paulnsorensen
Copy link
Copy Markdown

Summary

Adds opt-in early-exit to the loop via a structured completion tag. When
stop_on_completion_signal: true is set in frontmatter and the agent
emits a matching <promise>TEXT</promise> tag in its output, ralphify
stops after the current iteration instead of continuing to the iteration
budget. The default inner text is RALPH_PROMISE_COMPLETE; override it
with the optional completion_signal field.

Why the <promise> tag format

The tag format intentionally mirrors what the Claude Code ralphify
plugin
(Ralph-Wiggum) already uses to signal structured completion
back to its harness. Aligning on the same <promise>...</promise> shape
means a single prompt body works on both sides — a ralph written against
the Claude plugin can stop a ralphify loop with no adapter code, and
vice versa. Ralphify still runs its own command/prompt loop; only the
completion-tag format is shared.

What's new

  • src/ralphify/_promise.py — strict <promise>TEXT</promise> parser
    with whitespace normalization (bare marker text no longer counts)
  • Two optional frontmatter fields:
    • completion_signal — inner text of the promise tag (default
      RALPH_PROMISE_COMPLETE)
    • stop_on_completion_signal — bool gate for early exit (default
      false)
  • examples/promise-completion/RALPH.md — canonical example, surfaced
    from ralph scaffold help and the cookbook
  • ralph run --help and ralph scaffold --help now explain the pattern
  • Docs updated across getting-started, cli, how-it-works, agents,
    cookbook, quick-reference
  • Regression coverage in tests/test_promise.py,
    tests/test_agent.py, tests/test_cli.py, tests/test_engine.py
    (changed modules land at ~98% coverage)

Backwards compatibility

Default behavior is unchanged. Loops still run to their iteration budget
unless both stop_on_completion_signal: true and the configured tag
are present. Existing ralphs are untouched.

Test plan

  • uv run pytest — 661 passed, 1 xpassed
  • uv run ruff check .
  • uv run ruff format --check .
  • uv run ty check
  • uv run mkdocs build --strict
  • Example ralph in examples/promise-completion/ exits early on
    <promise>COMPLETE</promise>

Note on help-text test robustness

tests/test_cli.py::TestHelp asserts that <promise>...</promise>
appears in --help output. Rich wraps help text differently in headless
CI vs. a local TTY, which splits the tag across lines and breaks a
naïve substring check. The tests strip ANSI codes and collapse
whitespace before asserting, so they're stable on both sides.

Adds opt-in early-exit to the loop via a structured completion tag.
When stop_on_completion_signal: true is set in frontmatter and the agent
emits a matching <TEXT> tag in its output, ralphify stops after the
current iteration instead of continuing to the iteration budget.

- _promise.py: strict TEXT parser with whitespace normalization
- Two optional frontmatter fields: completion_signal and stop_on_completion_signal
- examples/promise-completion/RALPH.md: canonical example
- Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference
- Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage)
paulnsorensen and others added 2 commits April 18, 2026 20:02
#6)

* feat: seed CLI adapter layer with Claude, Codex, Copilot adapters

Introduces a pluggable CLIAdapter Protocol and lands the three concrete
adapters that exercise it, so the abstraction ships with demonstrated
value on day one.

- src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent,
  first-match ADAPTERS registry, select_adapter() dispatch with
  GenericAdapter fallback.
- src/ralphify/adapters/_generic.py — blocking-path fallback adapter.
- src/ralphify/adapters/claude.py — stem "claude", --output-format
  stream-json --verbose, parses tool_use blocks in assistant messages,
  extracts completion signal from terminal result event.
- src/ralphify/adapters/codex.py — stem "codex", --json, maps
  CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans
  TurnCompleted/TaskComplete for promise tags.
- src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"),
  --output-format json, renders_structured=False, supports_soft_windown
  =False.

install_wind_down_hook is part of the Protocol surface but every
concrete adapter currently raises NotImplementedError — the actual hook
plumbing (soft wind-down, AgentHook fanout, max_turns enforcement)
lands in subsequent PRs. This PR is pure additions with no engine
wiring: adapters are registered and discoverable, but the run loop
still uses the old hard-coded Claude path.

Tests: 50 new tests covering each adapter's event parsing, command
building, completion-signal extraction, and registry dispatch.

Co-Authored-By: WOZCODE <contact@withwoz.com>

* feat: route Claude detection and promise completion through adapter

Wire the CLI adapter layer into the three sites that previously
hard-coded Claude knowledge:

- _agent.py: execute_agent resolves the adapter (or accepts one from
  the caller) and delegates flag construction to adapter.build_command.
  _run_agent_streaming no longer appends --output-format/--verbose
  itself — the cmd it receives is already flagged.
- _console_emitter.py: startup-hint text picks "live activity on" vs
  "live output on" via adapter.renders_structured instead of a
  claude-binary check.
- engine.py: promise completion runs through
  adapter.extract_completion_signal(captured_stdout, signal).
  ClaudeAdapter scans the terminal result event only; GenericAdapter
  scans full stdout to preserve behavior for unknown CLIs. The
  engine-side _record_promise_fragment incremental scanner is gone.

Tightens ClaudeAdapter.extract_completion_signal to return False when
no result event is present so embedded <promise> tags inside status
or assistant JSON can't trigger early completion.

* fix: address copilot review feedback on adapter seed PR

- Rename `supports_soft_windown` → `supports_soft_wind_down` across the
  Protocol and all adapters to match the `install_wind_down_hook` /
  "wind-down" terminology used elsewhere (caught before it baked into
  the public adapter surface).
- Claude and Copilot `build_command()` now detect the `--output-format`
  flag pair and overwrite a user-supplied value instead of treating the
  flag and value tokens independently (which produced invalid argv when
  a caller pre-set `--output-format <other>`).
- Align Claude `parse_event()` docstring with actual behavior: `result`
  events return `kind="result"` and non-assistant events return
  `kind="message"` rather than `None`.
- Flip Codex `renders_structured` to False so the peek panel stays in
  raw-line mode until the emitter is adapter-driven; Codex's streaming
  path currently feeds an `_IterationPanel` that only understands
  Claude's stream-json schema.

* refactor: split renders_structured into supports_streaming and renders_structured_peek

Addresses the architectural concern in PR #6 review: the old
`renders_structured` flag conflated two orthogonal capabilities:

1. Whether the adapter emits a structured JSON stream that the execution
   path can parse into on_activity callbacks.
2. Whether the console peek panel understands the adapter's event schema
   and should render it via `_IterationPanel` instead of raw lines.

Before the split, an adapter had to choose between a useful streaming
path with an empty peek panel (Codex's previous state) or a working raw
peek with no activity callbacks. The split makes the correct combination
representable:

- ClaudeAdapter: supports_streaming=True, renders_structured_peek=True
- CodexAdapter:  supports_streaming=True, renders_structured_peek=False
- CopilotAdapter / GenericAdapter: both False

Codex now takes the streaming path again (on_activity callbacks fire)
while the peek panel stays in raw-line mode until the emitter is
adapter-driven.

Callers updated:
- _agent.execute_agent dispatches on adapter.supports_streaming
- _console_emitter consumes adapter.renders_structured_peek via
  _agent_renders_structured_peek (renamed)
- tests/conftest autouse fixture forces both flags to False

* perf: skip stdout capture when adapter exposes result_text

The previous adapter seed forced capture_stdout=True on every iteration
so the engine could feed the buffered transcript into
adapter.extract_completion_signal. For verbose streaming agents this
regressed memory vs the prior tail-scan approach, and Claude in
particular was paying for a full transcript buffer it did not need:
agent.result_text already carries the terminal assistant message.

Add a requires_full_stdout_for_completion capability flag and a
keyword-only extract_completion_signal(result_text=, stdout=,
user_signal=) signature so each adapter declares its own input. Claude
sets the flag False and reads result_text directly; Codex / Copilot /
Generic keep the full-stdout path.

The engine now gates capture on log_dir or
(stop_on_completion_signal AND requires_full_stdout_for_completion),
so streaming agents skip the buffer entirely unless logging is on, and
non-streaming agents only buffer when the user opts into completion
signalling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: WOZCODE <contact@withwoz.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
)

* refactor: extract adapter Protocol and registry into _protocol.py

The adapter package's __init__ both imports concrete adapter modules
(to populate ADAPTERS) and defines the Protocol those modules depend
on. That's a latent circular-import hazard: it works today only because
__init__ defines the Protocol before it imports concrete adapters. As
soon as an adapter grows a dependency that could re-enter the package
namespace, the order-of-execution becomes load-bearing.

Split the Protocol, the AdapterEvent NamedTuple, the Literal kind
enums, and the ADAPTERS list into a new leaf module _protocol.py.
Concrete adapters (claude, codex, copilot, _generic) now import from
_protocol directly. __init__.py re-exports the public names so existing
`from ralphify.adapters import CLIAdapter` imports keep working, and
keeps select_adapter plus the builtin-registration bootstrap.

No behaviour change.

* feat: add turn-cap event types and max_turns frontmatter fields

Preparatory foundation for the forthcoming turn-cap enforcement work:
no enforcement yet, just the event-type surface and the validated
RunConfig fields that enforcement will read.

- _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and
  ITERATION_TURN_CAPPED event kinds with typed data payloads
  (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded
  through the EventData union.
- _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE
  constants.
- _run_types.py: max_turns (int | None, default None) and
  max_turns_grace (int, default 2) fields on RunConfig.
- cli.py: _validate_max_turns / _validate_max_turns_grace helpers
  with range and type checks; grace must be strictly less than the
  cap when the cap is set. _build_run_config threads both values into
  the returned RunConfig.
- tests/test_cli_frontmatter_fields.py: unit coverage for the
  validators plus end-to-end _build_run_config assertions for the
  default, valid, and invalid cases.

Nothing consumes max_turns yet; it will be read by the adapter-driven
enforcement path in a follow-up PR.

* fix: mock shutil.which in frontmatter field tests

CI runners don't have 'claude' on PATH, so _build_run_config's agent
validation was failing before reaching the turn-cap assertions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant