Skip to content

feat(status): inline failure snippets on CI status surfaces#1247

Merged
gregmagolan merged 6 commits into
mainfrom
feat/inline-failure-snippets
Jun 21, 2026
Merged

feat(status): inline failure snippets on CI status surfaces#1247
gregmagolan merged 6 commits into
mainfrom
feat/inline-failure-snippets

Conversation

@gregmagolan

Copy link
Copy Markdown
Member

Summary

On a failure, the CI status surfaces (GitHub/GitLab status checks, Buildkite annotations, PR summary comments) today show what failed — failed test labels + kind + duration, failed action labels + mnemonics — plus links to the raw logs. To see why something failed, a user has to click through to the Aspect Web UI or the CI artifact. The old rosetta-driven Marvin v1 inlined failed build-action messages; it never inlined test logs.

This brings the surfaces to (and past) that parity: it inlines a concise snippet of the actual failure output for both failed tests and failed build actions, directly on the surfaces.

New lib/log_snippet.axl turns a log path into a bounded, cleaned snippet: it prefers a structured JUnit test.xml <failure>/<error> element when present (zero heuristics), otherwise applies tail-with-error-anchor windowing — strip ANSI/CR noise, anchor on the last failure marker, fall back to the tail — over a byte-capped prefix. Failed-action stderr (historically dropped to keep large inlined output off the Starlark heap) is now captured by path only (the BES File.uri, resolved via the existing bb_clientd helper) and read on demand at render time, so the bytes never land on the heap.

  • Status checks & Buildkite annotations render snippets as collapsed <details> under each failure table via the shared bazel_results renderer. Snippets shed first in the existing 65 KB / 1 MiB size-cap trim ladder, so the failure tables users came for are never regressed.
  • PR summary comment is a cross-job rollup (not a details body), so each job extracts its own snippets (only it can read its local log/stderr paths) and ships them in its status artifact. The writer applies sticky + deterministic + hard-capped (5) global selection persisted in the comment's state block, so the chosen snippets don't flip-flop as sibling jobs report in — an already-shown snippet is never evicted by a later-arriving one.

Changes are visible to end-users: yes

  • Searched for relevant documentation and updated as needed: yes (in-tree module docstrings)
  • Breaking change (forces users to change their own code or config): no
  • Suggested release notes appear below: yes

Suggested release notes

  • Failed tests and failed build actions now show an inline snippet of their output directly on CI status checks, Buildkite annotations, and the PR summary comment — no more clicking through to read the failure.

Test plan

  • New test cases added
    • .aspect/axl.axl: test_log_snippet + test_bazel_render_snippets (28 cases) — extraction, ANSI/CR stripping, marker-anchored windowing vs. tail fallback, byte cap, JUnit-first (incl. self-closing + entity unescape), missing-path degrade, and full render_check_output integration (inline test + action snippets, labeled <details>, off-by-default, budget-overflow note).
    • github_status_comments_test.axl: _check_select_snippets — deterministic order, sticky last-sorting-survives-contention, hard cap at 5, stale-key drop, malformed skip, a render_body integration for the 🔎 Failure output section, and a regression test that shown_snippets survives the state-block round-trip.
  • Covered by existing test cases (all 7 aspect dev *-snapshots suites stay green)
  • Local verification: aspect tests axl → 824 passed; all 7 snapshot suites exit 0; cargo test -p aspect-cli → 40 passed. Rendered Markdown eyeballed on both the check surface and the PR comment.
  • Remaining: exercise real build/test failures through live CI.

🤖 Generated with Claude Code

@aspect-workflows

aspect-workflows Bot commented Jun 18, 2026

Copy link
Copy Markdown

✨ Aspect Workflows Tasks

📅 Sun Jun 21 18:13:19 UTC 2026

⚠️ 2 flagged tasks

  • ⚠️ delivery (delivery-gha-debug) · ⏱ 22.7s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (1 delivered · 2 warn · 3 skipped)
  • ⚠️ delivery (delivery-gha) · ⏱ 40.2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (1 delivered · 2 warn · 3 skipped)

✅ 17 successful tasks

  • ✅ build (build-gha-debug) · ⏱ 2m 1s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (166 built)
  • ✅ build (build-gha) · ⏱ 2m 27s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (166 built)
  • ✅ build (build-gha-ephemeral) · ⏱ 55.3s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (9 built)
  • ✅ buildifier (buildifier-gha-debug) · ⏱ 36.5s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ buildifier (buildifier-gha) · ⏱ 1m · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha-debug) · ⏱ 1m 17s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha) · ⏱ 1m 58s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ gazelle (gazelle-gha-debug) · ⏱ 1m 24s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-from-source-gha-debug) · ⏱ 2m 3s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-from-source-gha) · ⏱ 1m 41s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-gha) · ⏱ 44.4s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ build (init-shell) · ⏱ 1m · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (10 built)
  • ✅ lint (lint-gha-debug) · ⏱ 37.9s · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ lint (lint-gha) · ⏱ 50.7s · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ test (test-gha-debug) · ⏱ 2m 31s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (26/26 passed · 25 cached)
  • ✅ test (test-gha) · ⏱ 1m 48s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (26/26 passed · 26 cached)
  • ✅ test (test-gha-ephemeral) · ⏱ 1m 28s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (1/1 passed)

🔁 Reproduce

⚠️ delivery (delivery-gha-debug · delivery-gha)

# --mode=always --track-state=false for off-runner with no state backend.
aspect delivery \
  --commit-sha=46c17e2c766c3ae5651a2d2bfa83272003fd0308 \
  --mode=always \
  --track-state=false \
  --dry-run=true

Install aspect: aspect.build/docs/cli/install


⏱ Last updated Sun Jun 21 18:16:21 UTC 2026 · 📊 GitHub API quota 1,643/15,000 (11% used, resets in 35m)
🚀 Powered by Aspect CLI (v0.0.0-dev)  |  Aspect Build · X · LinkedIn · YouTube

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d5a17fb47

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/aspect-cli/src/builtins/aspect/lib/log_snippet.axl Outdated
Comment thread crates/aspect-cli/src/builtins/aspect/feature/github_status_comments.axl Outdated
Comment thread crates/aspect-cli/src/builtins/aspect/lib/bazel_results.axl Outdated
@gregmagolan gregmagolan force-pushed the feat/inline-failure-snippets branch 2 times, most recently from ae86df4 to 4135de3 Compare June 18, 2026 03:10
@gregmagolan

Copy link
Copy Markdown
Member Author

Added in a13f2455 (follow-up to review feedback + a feature request):

Overflow notes with per-task attribution. When more failures exist than fit, the PR summary now shows e.g. _+ 5 more failures — not shown (3 in \test`, 2 in `build`)._`. Each job ships its uncapped eligible-failure count alongside its snippets, so the count covers both failures the producer never shipped (over the per-task cap of 5) and shipped snippets that lose the global budget (5). The check surfaces (status checks / Buildkite) keep their existing flat note since they render a single task.

Repro/Fix sections now use the same selection. Extracted a generic _select_sticky core (shared by snippets and commands). The Reproduce and Fix sections dedup, sort deterministically (severity → kind → command → description, replacing the old -rank-only sort that let ties churn), sticky-select against a global cap, and collapse the rest into a per-task "N more not shown" note. The sticky sets persist in the comment state block (shown_repros / shown_fixes) so the shown commands don't flip-flop as sibling jobs report in — same anti-flip-flop contract as snippets.

Tests added: snippet overflow attribution (incl. unshipped-over-cap and multi-task), the _select_sticky core, _command_overflow attribution, _format_overflow_note formatting, and a state round-trip for the new sticky sets. aspect tests axl → 832 passed; all 7 snapshot suites + cargo test green.

@gregmagolan gregmagolan force-pushed the feat/inline-failure-snippets branch 2 times, most recently from 1624861 to 0c63a31 Compare June 21, 2026 05:38
@gregmagolan gregmagolan force-pushed the feat/inline-failure-snippets branch from 0c63a31 to 3391b21 Compare June 21, 2026 06:12
gregmagolan and others added 4 commits June 21, 2026 08:57
…ng on CI

Two fixes so the inline test-log snippet shows the real failure output on CI:

1. Synthesized-JUnit fallback. sh_test/cc_test without native JUnit output get a
   Bazel-synthesized test.xml whose <error message="exited with error code N">
   is generic and whose real output is in <system-out>. The JUnit-first path was
   emitting that terse message. `extract_junit_failure` now treats a low-signal
   synthesized message with an empty element as a miss, falling back to the raw
   test.log (where the real output is). Real <failure>/<error> with a body or a
   genuine assertion message is unaffected. Regression tests added.

2. --keep_going on CI. The test task runs `aspect test //...`; a failed build
   action (e.g. a genrule) aborted the whole build phase ("No test targets were
   found"), so no test ran and no test snippet surfaced. Add `common:ci
   --keep_going` to .bazelrc and inject `--config=ci` from .aspect/config.axl
   when CI is set, so every failing target/test surfaces in one run. Drops the
   now-redundant explicit --keep_going from the ephemeral test job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cation marks

UX tweaks ahead of a fuller surface pass:

PR summary comment:
  - Drop the per-snippet <details>; group snippets under "💥 Build Failures" and
    "❌ Test Failures" sections.
  - Header per snippet matches the Repro/Fix treatment:
      ❌ build (build-gha, build-gha-debug, …) · `//pkg:x` failed to build:
      ❌ test  (test-gha, …)                   · `//pkg:y` failed:
    (timeout bucket uses "timed out"). Task attribution is the deduped
    contributing-task list, not a trailing "· a, b, c" on a <summary>.

Check run / Buildkite surfaces:
  - Drop the per-row <details> — snippets already sit inside the bucket's
    "Build Failure(s)" / "Test Failure(s)" <details>. Render inline, labeled by
    a `code` span.

Both surfaces:
  - When the log was windowed, show a `…` line above the snippet (head dropped)
    and/or below it (tail dropped), instead of a trailing "output truncated"
    note. log_snippet now reports trunc_head/trunc_tail (a byte-cap read counts
    as a tail cut).

Tests updated for the grouped layout, Repro/Fix-style headers, no nested
<details>, and head/tail `…` markers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…duration

Restyle the PR-summary inline-snippet header:
  - "❌ <rule-kind> `<label>` <verb>:" — uses the rule kind (genrule, sh_test, …)
    and drops the inline task list from the header.
  - Contributing task(s) move to a small `<sub>` line BELOW the snippet.
  - Test failures fold their duration into the verb: "failed in 149ms" /
    "timed out after 1m". Build-action failures have no reliable BES duration,
    so they stay "failed to build".

duration_ms is now shipped per snippet by extract_failure_snippets (test wall
time; 0 for build actions) and threaded through selection into the header.

PR-comment overflow stays as the existing "+N more not shown" note (no table).
Check-run / Buildkite table↔snippet de-dup is deferred to the surface UX pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "+N more not shown" note double-counted: a single deduped dropped item
shared by M tasks reported total=1 but "(1 in a, 1 in b, 1 in c)" — the
per-task counts summed past the total. Per-task counts are meaningless once
items are deduped across tasks.

`_by_task_overflow` now returns `{total, tasks}` — `total` is the count of
distinct dropped items, `tasks` is the deduped, sorted set of tasks holding at
least one hidden item. `_format_overflow_note` renders "(across `a`, `b`)"
instead of "(N in `a`, …)". Applies to both the failure-snippet and repro/fix
overflow notes (shared helper).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gregmagolan gregmagolan force-pushed the feat/inline-failure-snippets branch from d477038 to 34d887a Compare June 21, 2026 15:57
…mment

PR summary comment:
  - The small task line under a snippet now reads "in task <x>" / "in tasks
    <a, b, …>" instead of a bare task list.

GitHub status check / Buildkite annotation (bazel_results):
  - A failure that gets an inline snippet is removed from the bucket's <pre>
    table — the snippet block represents it. The table now lists only failures
    WITHOUT a snippet (over the snippet budget, or no readable log), so the
    label/output isn't shown twice.
  - Snippet blocks use the same header treatment as the PR comment:
    "❌ <kind> `<label>` <verb>:" — rule kind, label, and verb with the test
    duration folded in ("failed in 3s", "timed out after 1m"; "failed to build"
    for actions, which have no reliable BES duration).
  - The trim ladder sheds the heavy log body under the 65 KB cap but keeps each
    failure's one-line header visible.

Tests updated for the new headers + no table/snippet duplication.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gregmagolan gregmagolan force-pushed the feat/inline-failure-snippets branch from 34d887a to 46c17e2 Compare June 21, 2026 18:10
@gregmagolan gregmagolan merged commit ccce941 into main Jun 21, 2026
55 checks passed
@gregmagolan gregmagolan deleted the feat/inline-failure-snippets branch June 21, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant