feat(status): inline failure snippets on CI status surfaces#1247
Conversation
✨ Aspect Workflows Tasks📅 Sun Jun 21 18:13:19 UTC 2026
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d5a17fb47
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ae86df4 to
4135de3
Compare
|
Added in Overflow notes with per-task attribution. When more failures exist than fit, the PR summary now shows e.g. Repro/Fix sections now use the same selection. Extracted a generic Tests added: snippet overflow attribution (incl. unshipped-over-cap and multi-task), the |
1624861 to
0c63a31
Compare
0c63a31 to
3391b21
Compare
…ng on CI
Two fixes so the inline test-log snippet shows the real failure output on CI:
1. Synthesized-JUnit fallback. sh_test/cc_test without native JUnit output get a
Bazel-synthesized test.xml whose <error message="exited with error code N">
is generic and whose real output is in <system-out>. The JUnit-first path was
emitting that terse message. `extract_junit_failure` now treats a low-signal
synthesized message with an empty element as a miss, falling back to the raw
test.log (where the real output is). Real <failure>/<error> with a body or a
genuine assertion message is unaffected. Regression tests added.
2. --keep_going on CI. The test task runs `aspect test //...`; a failed build
action (e.g. a genrule) aborted the whole build phase ("No test targets were
found"), so no test ran and no test snippet surfaced. Add `common:ci
--keep_going` to .bazelrc and inject `--config=ci` from .aspect/config.axl
when CI is set, so every failing target/test surfaces in one run. Drops the
now-redundant explicit --keep_going from the ephemeral test job.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cation marks
UX tweaks ahead of a fuller surface pass:
PR summary comment:
- Drop the per-snippet <details>; group snippets under "💥 Build Failures" and
"❌ Test Failures" sections.
- Header per snippet matches the Repro/Fix treatment:
❌ build (build-gha, build-gha-debug, …) · `//pkg:x` failed to build:
❌ test (test-gha, …) · `//pkg:y` failed:
(timeout bucket uses "timed out"). Task attribution is the deduped
contributing-task list, not a trailing "· a, b, c" on a <summary>.
Check run / Buildkite surfaces:
- Drop the per-row <details> — snippets already sit inside the bucket's
"Build Failure(s)" / "Test Failure(s)" <details>. Render inline, labeled by
a `code` span.
Both surfaces:
- When the log was windowed, show a `…` line above the snippet (head dropped)
and/or below it (tail dropped), instead of a trailing "output truncated"
note. log_snippet now reports trunc_head/trunc_tail (a byte-cap read counts
as a tail cut).
Tests updated for the grouped layout, Repro/Fix-style headers, no nested
<details>, and head/tail `…` markers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…duration
Restyle the PR-summary inline-snippet header:
- "❌ <rule-kind> `<label>` <verb>:" — uses the rule kind (genrule, sh_test, …)
and drops the inline task list from the header.
- Contributing task(s) move to a small `<sub>` line BELOW the snippet.
- Test failures fold their duration into the verb: "failed in 149ms" /
"timed out after 1m". Build-action failures have no reliable BES duration,
so they stay "failed to build".
duration_ms is now shipped per snippet by extract_failure_snippets (test wall
time; 0 for build actions) and threaded through selection into the header.
PR-comment overflow stays as the existing "+N more not shown" note (no table).
Check-run / Buildkite table↔snippet de-dup is deferred to the surface UX pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "+N more not shown" note double-counted: a single deduped dropped item
shared by M tasks reported total=1 but "(1 in a, 1 in b, 1 in c)" — the
per-task counts summed past the total. Per-task counts are meaningless once
items are deduped across tasks.
`_by_task_overflow` now returns `{total, tasks}` — `total` is the count of
distinct dropped items, `tasks` is the deduped, sorted set of tasks holding at
least one hidden item. `_format_overflow_note` renders "(across `a`, `b`)"
instead of "(N in `a`, …)". Applies to both the failure-snippet and repro/fix
overflow notes (shared helper).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d477038 to
34d887a
Compare
…mment
PR summary comment:
- The small task line under a snippet now reads "in task <x>" / "in tasks
<a, b, …>" instead of a bare task list.
GitHub status check / Buildkite annotation (bazel_results):
- A failure that gets an inline snippet is removed from the bucket's <pre>
table — the snippet block represents it. The table now lists only failures
WITHOUT a snippet (over the snippet budget, or no readable log), so the
label/output isn't shown twice.
- Snippet blocks use the same header treatment as the PR comment:
"❌ <kind> `<label>` <verb>:" — rule kind, label, and verb with the test
duration folded in ("failed in 3s", "timed out after 1m"; "failed to build"
for actions, which have no reliable BES duration).
- The trim ladder sheds the heavy log body under the 65 KB cap but keeps each
failure's one-line header visible.
Tests updated for the new headers + no table/snippet duplication.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
34d887a to
46c17e2
Compare
Summary
On a failure, the CI status surfaces (GitHub/GitLab status checks, Buildkite annotations, PR summary comments) today show what failed — failed test labels + kind + duration, failed action labels + mnemonics — plus links to the raw logs. To see why something failed, a user has to click through to the Aspect Web UI or the CI artifact. The old rosetta-driven Marvin v1 inlined failed build-action messages; it never inlined test logs.
This brings the surfaces to (and past) that parity: it inlines a concise snippet of the actual failure output for both failed tests and failed build actions, directly on the surfaces.
New
lib/log_snippet.axlturns a log path into a bounded, cleaned snippet: it prefers a structured JUnittest.xml<failure>/<error>element when present (zero heuristics), otherwise applies tail-with-error-anchor windowing — strip ANSI/CR noise, anchor on the last failure marker, fall back to the tail — over a byte-capped prefix. Failed-action stderr (historically dropped to keep large inlined output off the Starlark heap) is now captured by path only (the BESFile.uri, resolved via the existingbb_clientdhelper) and read on demand at render time, so the bytes never land on the heap.<details>under each failure table via the sharedbazel_resultsrenderer. Snippets shed first in the existing 65 KB / 1 MiB size-cap trim ladder, so the failure tables users came for are never regressed.Changes are visible to end-users: yes
Suggested release notes
Test plan
.aspect/axl.axl:test_log_snippet+test_bazel_render_snippets(28 cases) — extraction, ANSI/CR stripping, marker-anchored windowing vs. tail fallback, byte cap, JUnit-first (incl. self-closing + entity unescape), missing-path degrade, and fullrender_check_outputintegration (inline test + action snippets, labeled<details>, off-by-default, budget-overflow note).github_status_comments_test.axl:_check_select_snippets— deterministic order, sticky last-sorting-survives-contention, hard cap at 5, stale-key drop, malformed skip, arender_bodyintegration for the🔎 Failure outputsection, and a regression test thatshown_snippetssurvives the state-block round-trip.aspect dev *-snapshotssuites stay green)aspect tests axl→ 824 passed; all 7 snapshot suites exit 0;cargo test -p aspect-cli→ 40 passed. Rendered Markdown eyeballed on both the check surface and the PR comment.🤖 Generated with Claude Code