fix(harness): stream output with idle watchdog and explicit stdin null (#695) by santoshkumarradha · Pull Request #696 · Agent-Field/agentfield

santoshkumarradha · 2026-06-28T22:10:21Z

Fixes #695

Problem

The CLI harness runner in all three SDKs blocked until the child process fully exited and read its output only at the end. Because it never streamed stdout, it could not apply a no-progress (idle) watchdog. When an opencode provider call stalled (commonly a stalled OpenRouter streaming response), the runner could not tell "stuck" from "working" and waited up to the wall-clock cap (default 1800s), freezing the run and holding a concurrency-semaphore slot the whole time. The TypeScript runner additionally opened stdin as a pipe that was never closed, so a child that probes stdin could hang waiting for EOF.

Fix per language

Same shape in all three SDKs. Return types and full stdout/stderr strings are unchanged, so downstream JSONL parsing is unaffected. The wall-clock timeout is kept as the outer bound.

Python (sdk/python/agentfield/harness/_cli.py):

Stream stdout and stderr concurrently into separate buffers, tracking the timestamp of the last received chunk.
If no output arrives for idle_seconds (resolved from the idle_seconds arg, then env AGENTFIELD_HARNESS_IDLE_SECONDS, then 120s; a value <= 0 disables it), kill the process group and raise TimeoutError.
Set stdin=asyncio.subprocess.DEVNULL and start_new_session=True so the child gets an immediate stdin EOF and its own process group.

Go (sdk/go/harness/cli.go, plus cli_unix.go and cli_windows.go):

Replace c.Run() with in-memory buffers by StdoutPipe/StderrPipe drained in two goroutines, guarded by a mutex, recording last-activity.
A one-second ticker checks the idle window and kills the process group when it is exceeded, returning a no-progress error.
Set c.Stdin to an empty reader and run the child in its own process group (Setpgid on Unix, no-op on Windows).

TypeScript (sdk/typescript/src/harness/cli.ts):

Add a setInterval idle watchdog over the existing concurrent data listeners; on idle it SIGKILLs the child and rejects with a no-progress error.
Spawn with stdio: ['ignore', 'pipe', 'pipe'] so the child's stdin receives an immediate EOF instead of an open pipe.
Idle window resolves from the idleSeconds option, then env AGENTFIELD_HARNESS_IDLE_SECONDS, then 120s.

Tests

Each SDK gains a test that spawns a child which prints one line then sleeps far past a short idle threshold (1s), asserting the runner aborts within seconds rather than at the wall-clock cap, plus a test that a normal fast command still returns its full output and exit code. Existing CLI tests that mocked the old wait-for-exit shape were updated to the streaming shape.

Test plan

Python:

```
cd sdk/python && uv run python -m pytest tests/test_run_cli_env.py tests/test_harness_cli.py tests/test_harness_runner.py tests/test_harness_provider_opencode.py tests/test_harness_provider_codex.py tests/test_harness_provider_gemini.py tests/test_harness_functional.py -q

86 passed

```

Go:

```
cd sdk/go && go build ./... # ok
go vet ./harness/ # no issues found
go test ./harness/ -run CLI -count=1 # ok, 7.5s
go test ./harness/ -count=1 -short # 178 passed
```

TypeScript:

```
cd sdk/typescript && npx tsc --noEmit # compilation completed
npm run build # ESM + DTS build success
npm test # 67 files, 633 tests passed
```

The idle watchdog fires within roughly the configured window (about 1 to 2 seconds in the tests) instead of the 1800s wall-clock cap, and fast commands still return their complete output and exit code.

#695) The CLI harness runner in all three SDKs blocked until the child exited and read output only at the end, so it could not apply a no-progress watchdog. A stalled opencode/OpenRouter streaming call froze the run up to the wall-clock cap (default 1800s) while holding a concurrency-semaphore slot. Python (sdk/python/agentfield/harness/_cli.py): - Stream stdout and stderr concurrently; track last-output time; kill the process group and raise TimeoutError if no output arrives for idle_seconds (env AGENTFIELD_HARNESS_IDLE_SECONDS, default 120; <= 0 disables). - Set stdin=DEVNULL and start_new_session=True. Wall-clock timeout kept. Go (sdk/go/harness/cli.go, cli_unix.go, cli_windows.go): - Switch from c.Run() buffers to StdoutPipe/StderrPipe drained in goroutines; poll last-activity; kill the process group on idle. Set c.Stdin to an empty reader and run the child in its own process group. TypeScript (sdk/typescript/src/harness/cli.ts): - Add an idle-watchdog interval over the existing data listeners; SIGKILL on idle. Spawn with stdio ['ignore','pipe','pipe'] so the child's stdin gets EOF. All return shapes are unchanged, so JSONL parsing downstream is unaffected. Adds idle-watchdog and fast-command tests in each SDK.

github-actions · 2026-06-28T22:12:24Z

Performance

SDK	Memory	Δ	Latency	Δ	Tests	Status
Python	9.4 KB	+4%	0.29 µs	-17%	✓	✓
Go	215 B	-23%	0.58 µs	-42%	✓	✓
TS	493 B	+41%	1.60 µs	-20%	✓	✗

⚠ Regression detected:

TypeScript memory: 350 B → 493 B (+41%)

github-actions · 2026-06-28T22:14:51Z

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 84%, aggregate ≥ 85%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface	Current	Baseline	Δ
`control-plane`	87.00%	87.40%	↓ -0.40 pp	🟡
`sdk-go`	91.70%	92.00%	↓ -0.30 pp	🟢
`sdk-python`	93.87%	93.73%	↑ +0.14 pp	🟢
`sdk-typescript`	90.09%	90.42%	↓ -0.33 pp	🟢
`web-ui`	84.83%	84.79%	↑ +0.04 pp	🟡
aggregate	85.63%	85.75%	↓ -0.12 pp	🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

github-actions · 2026-06-28T22:14:53Z

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface	Touched lines	Patch coverage	Status
`control-plane`	0	—	➖ no changes
`sdk-go`	112	87.00%	✅
`sdk-python`	0	—	➖ no changes
`sdk-typescript`	36	88.00%	✅
`web-ui`	0	—	➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

The .ai() path raised TimeoutError on the asyncio safety-net timeout with no retry (rate-limit retry only covers 429/503). A stalled OpenRouter connection therefore failed the reasoner outright. Add a timeout-retry layer that reissues the call on a fresh client pool (the pool is already reset on timeout), bounded by AGENTFIELD_AI_TIMEOUT_RETRIES (default 2, 0 disables). Applies to both the plain and tool-loop .ai paths. Existing deadlock-recovery tests run with retries disabled; added tests cover the retry-recovers and retry-exhausts cases.

… .ai() (#695) The .ai() timeout-retry now also covers transient provider glitches: a malformed 'Unable to get json response', a 5xx, or a dropped connection are retried on a fresh client pool, while permanent client errors (bad request, auth, model-not-found, unsupported-schema) propagate immediately. Observed live with glm-5.2 returning a garbage whitespace body that failed a reasoner outright.

santoshkumarradha requested review from a team and AbirAbbas as code owners June 28, 2026 22:10

santoshkumarradha mentioned this pull request Jun 28, 2026

Harness CLI runner waits for process exit without streaming or idle watchdog, causing 30-min stalls (Python, Go, TS) #695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696

fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696
santoshkumarradha wants to merge 3 commits into
mainfrom
fix/harness-idle-watchdog-695

santoshkumarradha commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

santoshkumarradha commented Jun 28, 2026

Problem

Fix per language

Tests

Test plan

86 passed

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

github-actions Bot commented Jun 28, 2026

📊 Coverage gate

✅ Gate passed

Uh oh!

github-actions Bot commented Jun 28, 2026

📐 Patch coverage gate

✅ Patch gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 28, 2026 •

edited

Loading