[aw-failures] Copilot harness drops failure diagnostics — fallback safe-output emission fails with EROFS (read-only safeoutputs) after retry e
[Content truncated due to length]

Sub-issue of #35661 (token-budget exhaustion). This tracks a **distinct, separately-fixable robustness/observability bug** that the token-cap failures expose: when a `copilot`-engine run fails and the harness tries to auto-emit a fallback safe-output, the write fails on a **read-only filesystem**, so the agent's failure signal is silently dropped.

### Problem statement

After the Copilot CLI exits non-zero (here, because it exhausted its 3 `--continue` retries against the 25M effective-tokens cap), the `copilot-harness` attempts to auto-emit a `missing_tool` safe-output to record *why* the run failed. That emission fails:

```
missing_tool emission failed: EROFS: read-only file system,
open '/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl'
```

Because the write is dropped, the run surfaces only a generic non-zero exit — the structured failure reason (numerous permission-denials, token-cap) never reaches `agent_output.json` / safe-outputs, degrading downstream triage and auto-issue quality.

### Affected workflows / runs (6h window, 2026-05-30)

EROFS fallback-emission failure observed in **2 of the 3** runs that hit the 25M token cap in this window:

| Workflow | Run | Token-cap peak | Harness state | EROFS on fallback |
|---|---|---|---|---|
| Daily Compiler Quality Check | [§26673439829](https://github.com/github/gh-aw/actions/runs/26673439829) | 25,003,302 / 25,000,000 | `permissionDeniedCount=11`, `hasNumerousPermissionDenied=true`, retries exhausted | ✅ yes |
| Copilot CLI Deep Research Agent | [§26675076543](https://github.com/github/gh-aw/actions/runs/26675076543) | 25,526,138 / 25,000,000 | retries exhausted (exitCode=1) | ✅ yes |
| Documentation Noob Tester | [§26675130908](https://github.com/github/gh-aw/actions/runs/26675130908) | 25,274,612 / 25,000,000 | 3 retries exhausted | not observed |

<details>
<summary>Harness diagnostic excerpt (run 26673439829)</summary>

```
[copilot-harness] attempt 1 failed: exitCode=1 isCAPIError400=false isMCPPolicyError=false
  isModelNotSupportedError=false isAuthError=false permissionDeniedCount=11
  hasNumerousPermissionDenied=true hasOutput=true retriesRemaining=3
[copilot-harness] missing_tool emission failed: EROFS: read-only file system,
  open '/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl'
... Last error: CAPIError: 429 Maximum effective tokens exceeded (25003302.30 / 25000000).
```

</details>

### Probable root cause

By the time the harness runs its post-failure fallback-emission path, the `safeoutputs` output file (`/home/runner/work/_temp/gh-aw/safeoutputs/outputs.jsonl`) is on a mount that is **read-only** (likely the safe-outputs collection window has closed or the bind-mount was switched to read-only during agent-container teardown). The fallback `missing_tool`/diagnostic write therefore hits `EROFS` and is dropped rather than retried or redirected.

This is independent of the token-cap root cause in #35661: even after #35661's remediations land, any agent that fails *after* the safe-outputs window closes will lose its diagnostic via the same path.

### Proposed remediation

1. **Keep the safe-outputs sink writable through fallback emission** — ensure `outputs.jsonl` (or an equivalent fallback file) remains writable for the harness's own post-failure emission, even after the agent container is torn down.
2. **Fail-safe fallback path** — if the primary `outputs.jsonl` is read-only, write the fallback diagnostic to a guaranteed-writable location (e.g. `$RUNNER_TEMP/gh-aw/agent/`) and have the `safe_outputs` job ingest it.
3. **Surface, don't swallow** — log the EROFS at `error` level into the step summary so the lost diagnostic is at least visible, and emit a `report_incomplete`-style signal so the failure reason is not reduced to a bare exit code.

### Success criteria / verification

- Over a 7-day window, zero `copilot`-engine runs log `missing_tool emission failed: EROFS` (or any `EROFS ... outputs.jsonl`).
- When an agent fails after retry exhaustion, the run's `agent_output.json` / safe-outputs contains the structured failure reason (token-cap, permission-denied count) rather than only a generic non-zero exit.

### Context — parent #35661 token-cap is still recurring

This window (2026-05-30 ~01:40–07:40 UTC) confirms the parent's 25M effective-tokens cap failure mode is **not abating**: 3 distinct `copilot`-engine workflows hit it (table above), one of which (**Copilot CLI Deep Research Agent**) was already named in #35661's original 2026-05-29 sample — i.e. a repeat offender. #35661's proposed remediations (per-workflow `max-turns`, MCP tool-surface trimming, `--no-resume` on retry after a 429) remain unimplemented and unverified. The related infra tracker #35780 (`awf-squid` unhealthy) did **not** recur in this window.

### Confidence & unknowns

- **High** confidence the EROFS emission failure is real and occurs in ≥2 distinct workflows (direct log evidence).
- **Medium** confidence on the exact mount-lifecycle cause (read-only at teardown) — not directly inspected; inferred from path + timing. The remediation should be validated against the actual mount setup in the agent job.

**References:** [§26673439829](https://github.com/github/gh-aw/actions/runs/26673439829) · [§26675076543](https://github.com/github/gh-aw/actions/runs/26675076543) · [§26675130908](https://github.com/github/gh-aw/actions/runs/26675130908)
Related to #35661







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26678271822) · opus48 4M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jun 6, 2026, 8:07 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] Copilot harness drops failure diagnostics — fallback safe-output emission fails with EROFS (read-only safeoutputs) after retry e [Content truncated due to length] #35888

Problem statement

Affected workflows / runs (6h window, 2026-05-30)

Probable root cause

Proposed remediation

Success criteria / verification

Context — parent #35661 token-cap is still recurring

Confidence & unknowns

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workflow	Run	Token-cap peak	Harness state	EROFS on fallback
Daily Compiler Quality Check	§26673439829	25,003,302 / 25,000,000	`permissionDeniedCount=11`, `hasNumerousPermissionDenied=true`, retries exhausted	✅ yes
Copilot CLI Deep Research Agent	§26675076543	25,526,138 / 25,000,000	retries exhausted (exitCode=1)	✅ yes
Documentation Noob Tester	§26675130908	25,274,612 / 25,000,000	3 retries exhausted	not observed

[aw-failures] Copilot harness drops failure diagnostics — fallback safe-output emission fails with EROFS (read-only safeoutputs) after retry e [Content truncated due to length] #35888

Description

Problem statement

Affected workflows / runs (6h window, 2026-05-30)

Probable root cause

Proposed remediation

Success criteria / verification

Context — parent #35661 token-cap is still recurring

Confidence & unknowns

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions