Skip to content

[aw-failures] Copilot harness drops failure diagnostics — fallback safe-output emission fails with EROFS (read-only safeoutputs) after retry e [Content truncated due to length] #35888

@github-actions

Description

@github-actions

Sub-issue of #35661 (token-budget exhaustion). This tracks a distinct, separately-fixable robustness/observability bug that the token-cap failures expose: when a copilot-engine run fails and the harness tries to auto-emit a fallback safe-output, the write fails on a read-only filesystem, so the agent's failure signal is silently dropped.

Problem statement

After the Copilot CLI exits non-zero (here, because it exhausted its 3 --continue retries against the 25M effective-tokens cap), the copilot-harness attempts to auto-emit a missing_tool safe-output to record why the run failed. That emission fails:

missing_tool emission failed: EROFS: read-only file system,
open '/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl'

Because the write is dropped, the run surfaces only a generic non-zero exit — the structured failure reason (numerous permission-denials, token-cap) never reaches agent_output.json / safe-outputs, degrading downstream triage and auto-issue quality.

Affected workflows / runs (6h window, 2026-05-30)

EROFS fallback-emission failure observed in 2 of the 3 runs that hit the 25M token cap in this window:

Workflow Run Token-cap peak Harness state EROFS on fallback
Daily Compiler Quality Check §26673439829 25,003,302 / 25,000,000 permissionDeniedCount=11, hasNumerousPermissionDenied=true, retries exhausted ✅ yes
Copilot CLI Deep Research Agent §26675076543 25,526,138 / 25,000,000 retries exhausted (exitCode=1) ✅ yes
Documentation Noob Tester §26675130908 25,274,612 / 25,000,000 3 retries exhausted not observed
Harness diagnostic excerpt (run 26673439829)
[copilot-harness] attempt 1 failed: exitCode=1 isCAPIError400=false isMCPPolicyError=false
  isModelNotSupportedError=false isAuthError=false permissionDeniedCount=11
  hasNumerousPermissionDenied=true hasOutput=true retriesRemaining=3
[copilot-harness] missing_tool emission failed: EROFS: read-only file system,
  open '/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl'
... Last error: CAPIError: 429 Maximum effective tokens exceeded (25003302.30 / 25000000).

Probable root cause

By the time the harness runs its post-failure fallback-emission path, the safeoutputs output file (/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl) is on a mount that is read-only (likely the safe-outputs collection window has closed or the bind-mount was switched to read-only during agent-container teardown). The fallback missing_tool/diagnostic write therefore hits EROFS and is dropped rather than retried or redirected.

This is independent of the token-cap root cause in #35661: even after #35661's remediations land, any agent that fails after the safe-outputs window closes will lose its diagnostic via the same path.

Proposed remediation

  1. Keep the safe-outputs sink writable through fallback emission — ensure outputs.jsonl (or an equivalent fallback file) remains writable for the harness's own post-failure emission, even after the agent container is torn down.
  2. Fail-safe fallback path — if the primary outputs.jsonl is read-only, write the fallback diagnostic to a guaranteed-writable location (e.g. $RUNNER_TEMP/gh-aw/agent/) and have the safe_outputs job ingest it.
  3. Surface, don't swallow — log the EROFS at error level into the step summary so the lost diagnostic is at least visible, and emit a report_incomplete-style signal so the failure reason is not reduced to a bare exit code.

Success criteria / verification

  • Over a 7-day window, zero copilot-engine runs log missing_tool emission failed: EROFS (or any EROFS ... outputs.jsonl).
  • When an agent fails after retry exhaustion, the run's agent_output.json / safe-outputs contains the structured failure reason (token-cap, permission-denied count) rather than only a generic non-zero exit.

Context — parent #35661 token-cap is still recurring

This window (2026-05-30 ~01:40–07:40 UTC) confirms the parent's 25M effective-tokens cap failure mode is not abating: 3 distinct copilot-engine workflows hit it (table above), one of which (Copilot CLI Deep Research Agent) was already named in #35661's original 2026-05-29 sample — i.e. a repeat offender. #35661's proposed remediations (per-workflow max-turns, MCP tool-surface trimming, --no-resume on retry after a 429) remain unimplemented and unverified. The related infra tracker #35780 (awf-squid unhealthy) did not recur in this window.

Confidence & unknowns

  • High confidence the EROFS emission failure is real and occurs in ≥2 distinct workflows (direct log evidence).
  • Medium confidence on the exact mount-lifecycle cause (read-only at teardown) — not directly inspected; inferred from path + timing. The remediation should be validated against the actual mount setup in the agent job.

References: §26673439829 · §26675076543 · §26675130908
Related to #35661

Generated by 🔍 [aw] Failure Investigator (6h) · opus48 4M ·

  • expires on Jun 6, 2026, 8:07 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions