fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696
Open
santoshkumarradha wants to merge 3 commits into
Open
fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696santoshkumarradha wants to merge 3 commits into
santoshkumarradha wants to merge 3 commits into
Conversation
#695) The CLI harness runner in all three SDKs blocked until the child exited and read output only at the end, so it could not apply a no-progress watchdog. A stalled opencode/OpenRouter streaming call froze the run up to the wall-clock cap (default 1800s) while holding a concurrency-semaphore slot. Python (sdk/python/agentfield/harness/_cli.py): - Stream stdout and stderr concurrently; track last-output time; kill the process group and raise TimeoutError if no output arrives for idle_seconds (env AGENTFIELD_HARNESS_IDLE_SECONDS, default 120; <= 0 disables). - Set stdin=DEVNULL and start_new_session=True. Wall-clock timeout kept. Go (sdk/go/harness/cli.go, cli_unix.go, cli_windows.go): - Switch from c.Run() buffers to StdoutPipe/StderrPipe drained in goroutines; poll last-activity; kill the process group on idle. Set c.Stdin to an empty reader and run the child in its own process group. TypeScript (sdk/typescript/src/harness/cli.ts): - Add an idle-watchdog interval over the existing data listeners; SIGKILL on idle. Spawn with stdio ['ignore','pipe','pipe'] so the child's stdin gets EOF. All return shapes are unchanged, so JSONL parsing downstream is unaffected. Adds idle-watchdog and fast-command tests in each SDK.
Contributor
Performance
⚠ Regression detected:
|
Contributor
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
Contributor
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
The .ai() path raised TimeoutError on the asyncio safety-net timeout with no retry (rate-limit retry only covers 429/503). A stalled OpenRouter connection therefore failed the reasoner outright. Add a timeout-retry layer that reissues the call on a fresh client pool (the pool is already reset on timeout), bounded by AGENTFIELD_AI_TIMEOUT_RETRIES (default 2, 0 disables). Applies to both the plain and tool-loop .ai paths. Existing deadlock-recovery tests run with retries disabled; added tests cover the retry-recovers and retry-exhausts cases.
… .ai() (#695) The .ai() timeout-retry now also covers transient provider glitches: a malformed 'Unable to get json response', a 5xx, or a dropped connection are retried on a fresh client pool, while permanent client errors (bad request, auth, model-not-found, unsupported-schema) propagate immediately. Observed live with glm-5.2 returning a garbage whitespace body that failed a reasoner outright.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #695
Problem
The CLI harness runner in all three SDKs blocked until the child process fully exited and read its output only at the end. Because it never streamed stdout, it could not apply a no-progress (idle) watchdog. When an opencode provider call stalled (commonly a stalled OpenRouter streaming response), the runner could not tell "stuck" from "working" and waited up to the wall-clock cap (default 1800s), freezing the run and holding a concurrency-semaphore slot the whole time. The TypeScript runner additionally opened stdin as a pipe that was never closed, so a child that probes stdin could hang waiting for EOF.
Fix per language
Same shape in all three SDKs. Return types and full stdout/stderr strings are unchanged, so downstream JSONL parsing is unaffected. The wall-clock timeout is kept as the outer bound.
Python (
sdk/python/agentfield/harness/_cli.py):idle_seconds(resolved from theidle_secondsarg, then envAGENTFIELD_HARNESS_IDLE_SECONDS, then 120s; a value <= 0 disables it), kill the process group and raiseTimeoutError.stdin=asyncio.subprocess.DEVNULLandstart_new_session=Trueso the child gets an immediate stdin EOF and its own process group.Go (
sdk/go/harness/cli.go, pluscli_unix.goandcli_windows.go):c.Run()with in-memory buffers byStdoutPipe/StderrPipedrained in two goroutines, guarded by a mutex, recording last-activity.c.Stdinto an empty reader and run the child in its own process group (Setpgidon Unix, no-op on Windows).TypeScript (
sdk/typescript/src/harness/cli.ts):setIntervalidle watchdog over the existing concurrent data listeners; on idle it SIGKILLs the child and rejects with a no-progress error.stdio: ['ignore', 'pipe', 'pipe']so the child's stdin receives an immediate EOF instead of an open pipe.idleSecondsoption, then envAGENTFIELD_HARNESS_IDLE_SECONDS, then 120s.Tests
Each SDK gains a test that spawns a child which prints one line then sleeps far past a short idle threshold (1s), asserting the runner aborts within seconds rather than at the wall-clock cap, plus a test that a normal fast command still returns its full output and exit code. Existing CLI tests that mocked the old wait-for-exit shape were updated to the streaming shape.
Test plan
Python:
```
cd sdk/python && uv run python -m pytest tests/test_run_cli_env.py tests/test_harness_cli.py tests/test_harness_runner.py tests/test_harness_provider_opencode.py tests/test_harness_provider_codex.py tests/test_harness_provider_gemini.py tests/test_harness_functional.py -q
86 passed
```
Go:
```
cd sdk/go && go build ./... # ok
go vet ./harness/ # no issues found
go test ./harness/ -run CLI -count=1 # ok, 7.5s
go test ./harness/ -count=1 -short # 178 passed
```
TypeScript:
```
cd sdk/typescript && npx tsc --noEmit # compilation completed
npm run build # ESM + DTS build success
npm test # 67 files, 633 tests passed
```
The idle watchdog fires within roughly the configured window (about 1 to 2 seconds in the tests) instead of the 1800s wall-clock cap, and fast commands still return their complete output and exit code.