Skip to content

fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696

Open
santoshkumarradha wants to merge 3 commits into
mainfrom
fix/harness-idle-watchdog-695
Open

fix(harness): stream output with idle watchdog and explicit stdin null (#695)#696
santoshkumarradha wants to merge 3 commits into
mainfrom
fix/harness-idle-watchdog-695

Conversation

@santoshkumarradha

Copy link
Copy Markdown
Member

Fixes #695

Problem

The CLI harness runner in all three SDKs blocked until the child process fully exited and read its output only at the end. Because it never streamed stdout, it could not apply a no-progress (idle) watchdog. When an opencode provider call stalled (commonly a stalled OpenRouter streaming response), the runner could not tell "stuck" from "working" and waited up to the wall-clock cap (default 1800s), freezing the run and holding a concurrency-semaphore slot the whole time. The TypeScript runner additionally opened stdin as a pipe that was never closed, so a child that probes stdin could hang waiting for EOF.

Fix per language

Same shape in all three SDKs. Return types and full stdout/stderr strings are unchanged, so downstream JSONL parsing is unaffected. The wall-clock timeout is kept as the outer bound.

Python (sdk/python/agentfield/harness/_cli.py):

  • Stream stdout and stderr concurrently into separate buffers, tracking the timestamp of the last received chunk.
  • If no output arrives for idle_seconds (resolved from the idle_seconds arg, then env AGENTFIELD_HARNESS_IDLE_SECONDS, then 120s; a value <= 0 disables it), kill the process group and raise TimeoutError.
  • Set stdin=asyncio.subprocess.DEVNULL and start_new_session=True so the child gets an immediate stdin EOF and its own process group.

Go (sdk/go/harness/cli.go, plus cli_unix.go and cli_windows.go):

  • Replace c.Run() with in-memory buffers by StdoutPipe/StderrPipe drained in two goroutines, guarded by a mutex, recording last-activity.
  • A one-second ticker checks the idle window and kills the process group when it is exceeded, returning a no-progress error.
  • Set c.Stdin to an empty reader and run the child in its own process group (Setpgid on Unix, no-op on Windows).

TypeScript (sdk/typescript/src/harness/cli.ts):

  • Add a setInterval idle watchdog over the existing concurrent data listeners; on idle it SIGKILLs the child and rejects with a no-progress error.
  • Spawn with stdio: ['ignore', 'pipe', 'pipe'] so the child's stdin receives an immediate EOF instead of an open pipe.
  • Idle window resolves from the idleSeconds option, then env AGENTFIELD_HARNESS_IDLE_SECONDS, then 120s.

Tests

Each SDK gains a test that spawns a child which prints one line then sleeps far past a short idle threshold (1s), asserting the runner aborts within seconds rather than at the wall-clock cap, plus a test that a normal fast command still returns its full output and exit code. Existing CLI tests that mocked the old wait-for-exit shape were updated to the streaming shape.

Test plan

Python:

```
cd sdk/python && uv run python -m pytest tests/test_run_cli_env.py tests/test_harness_cli.py tests/test_harness_runner.py tests/test_harness_provider_opencode.py tests/test_harness_provider_codex.py tests/test_harness_provider_gemini.py tests/test_harness_functional.py -q

86 passed

```

Go:

```
cd sdk/go && go build ./... # ok
go vet ./harness/ # no issues found
go test ./harness/ -run CLI -count=1 # ok, 7.5s
go test ./harness/ -count=1 -short # 178 passed
```

TypeScript:

```
cd sdk/typescript && npx tsc --noEmit # compilation completed
npm run build # ESM + DTS build success
npm test # 67 files, 633 tests passed
```

The idle watchdog fires within roughly the configured window (about 1 to 2 seconds in the tests) instead of the 1800s wall-clock cap, and fast commands still return their complete output and exit code.

#695)

The CLI harness runner in all three SDKs blocked until the child exited and
read output only at the end, so it could not apply a no-progress watchdog.
A stalled opencode/OpenRouter streaming call froze the run up to the
wall-clock cap (default 1800s) while holding a concurrency-semaphore slot.

Python (sdk/python/agentfield/harness/_cli.py):
- Stream stdout and stderr concurrently; track last-output time; kill the
  process group and raise TimeoutError if no output arrives for idle_seconds
  (env AGENTFIELD_HARNESS_IDLE_SECONDS, default 120; <= 0 disables).
- Set stdin=DEVNULL and start_new_session=True. Wall-clock timeout kept.

Go (sdk/go/harness/cli.go, cli_unix.go, cli_windows.go):
- Switch from c.Run() buffers to StdoutPipe/StderrPipe drained in goroutines;
  poll last-activity; kill the process group on idle. Set c.Stdin to an empty
  reader and run the child in its own process group.

TypeScript (sdk/typescript/src/harness/cli.ts):
- Add an idle-watchdog interval over the existing data listeners; SIGKILL on
  idle. Spawn with stdio ['ignore','pipe','pipe'] so the child's stdin gets EOF.

All return shapes are unchanged, so JSONL parsing downstream is unaffected.
Adds idle-watchdog and fast-command tests in each SDK.
@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners June 28, 2026 22:10
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.4 KB +4% 0.29 µs -17%
Go 215 B -23% 0.58 µs -42%
TS 493 B +41% 1.60 µs -20%

Regression detected:

  • TypeScript memory: 350 B → 493 B (+41%)

@github-actions

Copy link
Copy Markdown
Contributor

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 84%, aggregate ≥ 85%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.00% 87.40% ↓ -0.40 pp 🟡
sdk-go 91.70% 92.00% ↓ -0.30 pp 🟢
sdk-python 93.87% 93.73% ↑ +0.14 pp 🟢
sdk-typescript 90.09% 90.42% ↓ -0.33 pp 🟢
web-ui 84.83% 84.79% ↑ +0.04 pp 🟡
aggregate 85.63% 85.75% ↓ -0.12 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions

Copy link
Copy Markdown
Contributor

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 0 ➖ no changes
sdk-go 112 87.00%
sdk-python 0 ➖ no changes
sdk-typescript 36 88.00%
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

The .ai() path raised TimeoutError on the asyncio safety-net timeout with no
retry (rate-limit retry only covers 429/503). A stalled OpenRouter connection
therefore failed the reasoner outright. Add a timeout-retry layer that reissues
the call on a fresh client pool (the pool is already reset on timeout), bounded
by AGENTFIELD_AI_TIMEOUT_RETRIES (default 2, 0 disables). Applies to both the
plain and tool-loop .ai paths. Existing deadlock-recovery tests run with retries
disabled; added tests cover the retry-recovers and retry-exhausts cases.
… .ai() (#695)

The .ai() timeout-retry now also covers transient provider glitches: a
malformed 'Unable to get json response', a 5xx, or a dropped connection are
retried on a fresh client pool, while permanent client errors (bad request,
auth, model-not-found, unsupported-schema) propagate immediately. Observed live
with glm-5.2 returning a garbage whitespace body that failed a reasoner outright.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harness CLI runner waits for process exit without streaming or idle watchdog, causing 30-min stalls (Python, Go, TS)

1 participant