ci: add AI-powered issue triage workflow#14
Conversation
Adds a GitHub Actions workflow that uses Anthropic's Claude API to
automatically classify new and edited issues by priority and area, post
a single bot comment with a root-cause hypothesis and suggested next
steps, and apply the matching labels.
- .github/workflows/issue-triage.yml: runs on issue open/reopen/edit,
skips bots and issues labeled `no-triage`, per-issue concurrency,
5-minute timeout.
- .github/scripts/triage.mjs: idempotent (single bot comment keyed by
HTML marker, label reconciliation on re-runs), strict output
validation against a closed taxonomy, prompt-cache enabled.
- .github/scripts/labels.json + bootstrap-labels.mjs +
workflow_dispatch workflow: one-shot label provisioning for
priority/P0..P3, area/{cli,config,auth,execution,reporting,tms,docs},
and no-triage opt-out.
- .github/AI_TRIAGE.md: maintainer-facing doc covering setup
(ANTHROPIC_API_KEY secret + optional ANTHROPIC_TRIAGE_MODEL var),
opt-out, and how to report bot mistakes.
Default model is claude-haiku-4-5-20251001 (cheap, fast); override with
the ANTHROPIC_TRIAGE_MODEL repo variable. Comments are clearly marked
as automated and not human-reviewed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue-triage workflow only fires on opened/reopened/edited, so already-open issues are not triaged automatically. This adds a manual backfill that iterates issues and runs the same triage logic. - .github/scripts/lib.mjs: extract shared triageIssue() — Anthropic call, output sanitization, idempotent comment, label reconciliation. Both per-issue and backfill entry points consume it. - .github/scripts/triage.mjs: thin wrapper — reads env, fetches issue labels, delegates to lib. - .github/scripts/backfill.mjs: lists issues (skips PRs returned by the issues endpoint), filters by no-triage / already-priorityed, triages sequentially, prints a summary. Honors dry-run. - .github/workflows/triage-backfill.yml: workflow_dispatch with inputs (state, only_unlabeled, limit 1-500, dry_run). 30-minute timeout; backfill concurrency group prevents overlapping runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed two follow-up commits to address backfilling existing issues:
Recommended one-time post-merge sequence
|
Addresses every reviewer finding (Claude code-reviewer + silent-failure
hunter + Codex adversarial). Chief design change: the bot is now
ASSISTIVE, not authoritative. After first triage, human-applied
priority/area labels are preserved on re-runs unless the operator
explicitly opts in via the new `force_relabel` backfill input.
Security
- Bot comment lookup checks both the marker AND the author
(`github-actions[bot]`), so attackers can no longer hijack the PATCH
target by forging the marker on their own comment.
- All third-party actions pinned to full commit SHAs (checkout v4.3.1,
setup-node v4.4.0).
- Workflow root `permissions: {}`, per-job grants only.
- Untrusted model output is wrapped in `::stop-commands::<token>`
blocks before being logged, so prompt-injected workflow commands
(`::add-mask::`, etc.) are inert.
- Issue title/body clamped to 500/8000 chars before sending to the
model.
- Edited trigger is gated on actual title or body change to prevent
edit-loop cost amplification.
Correctness
- Schema validation throws SchemaError on invalid model output instead
of silently coercing to `priority/P2` + `confidence: 0.5`. Failures
apply a `triage-failed` label, post a failure comment, and exit the
workflow non-zero.
- Order reversed: labels reconciled FIRST, comment posted SECOND. A
comment with stale labels is more misleading than a missing comment
with correct labels.
- Label DELETE only swallows 404 (the only benign case); other failures
are logged.
- `triage.mjs` now reads from `GITHUB_EVENT_PATH` instead of env-var
fan-out — no extra GET, no env-var quirks for large bodies.
- Missing `ANTHROPIC_API_KEY` now exits 1 (was: silently exit 0 with
yellow warning).
- Anthropic 401/403 are fatal and abort backfill; 429/529 retry once
with `retry-after` honoured; non-2xx responses are typed errors with
status, path, and body fields.
- JSON parsing uses assistant-prefill `{` and a balanced-brace scanner
instead of greedy regex, with full context on parse failure.
- `gh()` handles non-JSON 2xx and supports `allowedStatuses` for
callers that need to discriminate (e.g. 404 means "not present").
- Comment pagination capped at 20 pages with a warning on overflow.
- Backfill abort-on-auth, stable sort=created+asc pagination, distinct
counts for triaged_ok / triaged_failed / needs_info / skipped /
errors_other / aborted_at, no-op summary warning when nothing was
actually triaged.
- All `fetch` calls have a 60s `AbortSignal.timeout`.
Quality
- Empty/short issues short-circuit to `needs-info` label without burning
a model call (was: hallucinated RCA).
- next_steps stripped of leading markdown sigils (`#`, `>`, `|`, `-`)
to prevent comment-body breakouts.
- DEFAULT_MODEL constant centralised in lib.mjs (was: 4 places).
- Shared `gh()` reused by bootstrap-labels.mjs (no duplicate helper).
- bootstrap-labels validates REPO existence and prints a per-label
summary at the end (created / updated / failed counts).
- `labels.json` adds `triage-failed` and `needs-info`.
- AI_TRIAGE.md rewritten to document label-ownership semantics,
force_relabel, the failure path, and the security hardening.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed The headline change: assistive, not authoritativeCodex flagged the original design as a no-ship: the bot deleted human-applied
Other reviewer findings, all addressedSecurity
Correctness — schema failure now loud
Quality
Diff for this commit: 9 files changed, +699 / −258. Total PR: +1138 / −0. The PR is now ready for human review. After merge:
|
`applyMarkerLabel` previously deleted every managed label except the new marker, including `priority/*` and `area/*`. That meant a reporter edit which downgraded an already-triaged issue to needs-info or schema-failed state would erase a maintainer's earlier label correction — exactly the sticky-human-label invariant this branch is supposed to enforce. Fix: limit `applyMarkerLabel` to toggling the two flag labels (`triage-failed` ↔ `needs-info`). priority/* and area/* are now left untouched on the failure and needs-info paths. Found by codex adversarial-review (2026-04-30). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed What was broken
This contradicted the explicit "human labels are sticky" contract added in the previous commit. Fix
13 lines changed. |
Two findings from codex adversarial-review on the previous commit:
[high] Bot couldn't refresh its own labels.
After first triage, the bot-applied `priority/*` looked identical to
a human-applied one, so `applyTriageLabels` treated it as
human-managed and refused to update on later edits. Reporter could
add critical repro details that bumped priority, comment body would
refresh with the new classification, but the label stayed at the old
value. The "human-sticky" contract degraded into "everything is
sticky after first triage."
[medium] Label decisions used a stale event snapshot.
Workflow read labels at job start, then mutated 1-3s later after the
Anthropic call returned. A maintainer relabeling during that window
was invisible to the mutation logic.
Fixes:
1. Embed the bot's last-applied set in the marker comment as
`<!-- ai-triage-applied: priority/Pn,area/x -->`. On re-runs, parse
it and compare against the freshly-fetched managed labels:
- sets equal => bot owns them, free to refresh.
- sets differ => human (or another bot) touched, preserve.
Failure and needs-info comments carry the previous applied set
forward so a follow-up success can still recognise its own labels.
2. New `loadFreshState()` re-fetches issue labels and the existing bot
comment in parallel right before any label/comment mutation. The
`issue.labels` field passed in from the event payload is now used
ONLY for the early `no-triage` skip check; mutation logic always
uses fresh state.
3. The marker is updated to match what the bot just wrote — NOT what's
currently on the issue. On preservation the marker stays frozen at
the bot's previous applied set, so the next run still detects
"human-touched" rather than claiming the human's labels as its own.
4. AI_TRIAGE.md rewritten to document the new ownership model.
Found by codex adversarial-review (2026-04-30, third pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed [high] Bot couldn't refresh its own labels — fixedPreviously the bot's own Fix: bot now embeds its last-applied set in the marker comment as
The marker stays frozen at the bot's previous set on preservation, so the next run still detects the human's correction as not-its-own. Failure and needs-info comments carry the previous applied set forward so a transient model-output failure doesn't make the bot forget what it last applied. [medium] Stale event snapshot — fixedNew Doc
Diff: 2 files, +187 / −65 in this commit. Behavior matrix after this commit
|
Addresses the must-fix and high-leverage findings from the final combined review (code-reviewer + silent-failure-hunter + codex adversarial). Net: failure paths can't silently violate the contract, and the bot is no longer one typo or schema drift away from quietly mis-classifying issues. Must-fix: 1. validate() requires 1-3 areas, all in the closed enum, instead of silently filtering invalid values to an empty array. An empty/all- invalid `areas` field now throws SchemaError just like a bad priority — the bot no longer happily emits priority/Pn with no area/* routing label. 2. Failure paths re-check no-triage after loadFreshState. A maintainer adding `no-triage` while Anthropic was responding to a malformed prompt was previously visible only on the success path; now both schema-fail branches (and applyTriageLabels GhError) honour it. 3. applyTriageLabels GhError is routed through the same triage-failed recovery as SchemaError. Previously, a missing label in the repo (bootstrap workflow not run, maintainer renamed a label) would leave the issue partially mutated with no marker and a confusing red workflow run. 4. Anthropic 4xx (other than 429) is now fatal. A typo'd ANTHROPIC_TRIAGE_MODEL var would previously make the model API return 404 model_not_found, classed non-fatal, and a backfill of 100 issues would burn 100 calls all returning the same 404. Now the first 4xx aborts the loop. 5. loadFreshState detects issue-deleted (404/410) via a new IssueGoneError. triageIssue catches and exits cleanly with action `skipped:issue-gone` instead of spraying writes at a dead resource (which previously surfaced as confusing logs from the sequence of safeRemoveLabel 404s followed by an addLabels 404). High-leverage cheap fixes: 6. loadFreshState uses Promise.allSettled so failure context names the specific arm (issue GET vs comment list) instead of dropping one to the floor. 7. validate() strips HTML comments from summary/rca/next_steps so a model-emitted `<!-- ai-triage-applied: ... -->` cannot win the marker regex on a future template change. 8. APPLIED_MARKER_RE is anchored to the start of the comment body (`^<!-- ai-triage-bot:v1 -->\r?\n<!-- ai-triage-applied: ... -->`) — defence-in-depth alongside #7. Verified by smoke test that an attacker-prefix injection is correctly rejected. 10. postOrUpdateComment falls back from PATCH 404 to POST so a maintainer deleting the bot's comment mid-run results in a fresh comment, not a workflow failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed Must-fix
High-leverage
160 insertions / 47 deletions in |
Summary
Adds a GitHub Actions workflow that uses Anthropic's Claude API to automatically classify every new and edited issue by priority (P0–P3) and area (cli, config, auth, execution, reporting, tms, docs), apply the matching labels, and post a single bot comment with a root-cause hypothesis and suggested next steps.
The bot comment is clearly marked as automated and not human-reviewed; maintainers can relabel, edit, or delete at any time. Adding the
no-triagelabel opts an issue out of the workflow.What's included
.github/workflows/issue-triage.yml— runs onissues: opened, reopened, edited. Skips bot authors andno-triage-labeled issues. Per-issue concurrency, 5-minute timeout,issues: write+contents: readonly..github/scripts/triage.mjs— calls Claude API, validates output against a closed taxonomy (drops anything off-list), posts/updates a single bot comment keyed by<!-- ai-triage-bot:v1 -->, reconciles labels on re-runs. Uses prompt caching for the system prompt..github/scripts/labels.json+bootstrap-labels.mjs+.github/workflows/bootstrap-labels.yml— one-shotworkflow_dispatchlabel provisioning. Idempotent (creates or patches color/description). Run this once after merge..github/AI_TRIAGE.md— maintainer-facing doc: setup, opt-out, model override, how to report bot mistakes.Setup required after merge
ANTHROPIC_API_KEYrepository secret (Settings → Secrets and variables → Actions).ANTHROPIC_TRIAGE_MODELrepository variable to override the default model. Default:claude-haiku-4-5-20251001(fast, ~fractions of a cent per issue). Set to e.g.claude-sonnet-4-6for higher-quality RCA.priority/P0..P3,area/cli|config|auth|execution|reporting|tms|docs, andno-triage.Design choices
bug,enhancement,question, etc.) are kept; the workflow only adds newpriority/*andarea/*namespaces, and infers an issuetypefor the comment without auto-applying abug/enhancementlabel (humans still do that).priority/*andarea/*removed, new ones added).Test plan
ANTHROPIC_API_KEYsecret.Bootstrap triage labelsworkflow once and confirms 12 labels are created.no-triageto a fresh issue; verify the workflow skips it.ANTHROPIC_TRIAGE_MODELto a Sonnet model and confirm the comment footer reflects the new model.🤖 Generated with Claude Code