Skip to content

Add fix-first heuristic to multi-review#45

Merged
TechNickAI merged 2 commits intomainfrom
feature/fix-first-multi-review
Apr 1, 2026
Merged

Add fix-first heuristic to multi-review#45
TechNickAI merged 2 commits intomainfrom
feature/fix-first-multi-review

Conversation

@TechNickAI
Copy link
Copy Markdown
Owner

Summary

  • Review findings now auto-classified as AUTO-FIX (obvious, unambiguous) or ASK (needs judgment)
  • Auto-fixes applied immediately without asking; ASK items batched into single question with recommendations
  • Inspired by gstack's fix-first review pattern — reduces round-trips and lets /ship flow without unnecessary pauses

Changes

  • plugins/core/commands/multi-review.md — v2.2.0 → v3.0.0
    • New 6-step fix-first synthesis workflow (Collect → Classify → Fix → Ask → Apply → Report)
    • Explicit AUTO-FIX criteria (unused imports, missing await, empty catches, etc.)
    • Explicit ASK criteria (architectural, security-sensitive, behavior changes)
    • Updated output format: auto-fixed vs user-approved vs wontfix vs deferred

Test plan

  • Run /multi-review on a branch with known issues — verify AUTO-FIX items are fixed without asking
  • Verify ASK items are batched into single question
  • Run /multi-review from /autotask — verify concise output

🤖 Generated with Claude Code

Review findings are now auto-classified as AUTO-FIX (obvious, unambiguous)
or ASK (needs user judgment). Auto-fixes are applied immediately; ASK items
are batched into a single question with recommendations. This reduces
round-trips and lets /ship and /autotask flow without unnecessary pauses.

Inspired by gstack's fix-first review pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Code Review: Add fix-first heuristic to multi-review

Overall this is a well-designed improvement. The fix-first pattern meaningfully reduces round-trips and the classification criteria are thoughtfully chosen. A few things worth considering:


Potential Issues

Missing await as AUTO-FIX is risky

Fire-and-forget patterns are legitimate (background jobs, non-critical side effects), and adding await changes runtime behavior — it's not purely stylistic. A missing await that causes a crash is probably obvious from context, but the blanket rule could cause unintended behavior changes. Consider narrowing to: "Missing await where the result is used or error must be caught" — or moving to ASK.

Duplicate code as AUTO-FIX

Deduplication requires choosing an abstraction (where does it live? what's the interface?), which is an architectural judgment call. Multiple agents flagging it independently increases confidence it's real duplication, but the fix still involves design decisions. This feels more like an ASK with a strong recommendation, not an AUTO-FIX.

No explicit guard for test files

AUTO-FIX criteria don't mention test files, but the ASK criteria do say "Anything that touches test assertions or expected values." An empty catch block inside a test helper, or an unused import in a test file, would qualify as AUTO-FIX by the current rules. Worth adding an explicit note like: "In test files, apply AUTO-FIX only for import/formatting issues; route logic changes to ASK."


Minor Observations

Ambiguous cases have no fallback: When a finding doesn't clearly fit either bucket, what's the default? Adding "When in doubt, classify as ASK" would prevent over-eager auto-fixing.

ASK format example uses hardcoded agent names (security-reviewer, architecture-auditor). These may not exist in all projects. A note like "use the actual agent name/type that caught it" would clarify intent.

Output format (only if ASK items were presented) works but is slightly awkward when all items were auto-fixed — the User-approved fixes (0 issues) section would just be absent. Consider an explicit note that this section is omitted when not applicable, so LLMs following the format don't generate an empty section.


What's Working Well

  • The 6-step Collect → Classify → Fix → Ask → Apply → Report structure is clear and easy to follow
  • AUTO-FIX criteria are well-scoped for truly low-risk items
  • Batching ASK items into a single question rather than N interruptions is the right UX
  • The /ship and /autotask concise output note is a good ergonomic touch
  • v3.0.0 version bump is appropriate — this is a meaningful behavior change, not a tweak

The missing await and duplicate code AUTO-FIX items are the highest-signal concerns — both can produce unexpected behavior changes without explicit user approval. The rest are polish suggestions.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 876ea1caf6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +113 to +114
4. **Ask**: If any ASK items exist, present them in a single batch with context and a
recommendation for each. Format:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid pausing /autotask for review decisions

This step makes /multi-review stop and ask the user whenever an ASK item exists, but /autotask is explicitly defined as an autonomous workflow (“without supervision” and “pause only for deal-killers” in plugins/core/commands/autotask.md:10-12,177-181). Because /autotask invokes /multi-review in balanced/deep paths, this new prompt behavior can block unattended runs on routine design/performance trade-offs that are not deal-killers.

Useful? React with 👍 / 👎.

→ RECOMMENDATION: keep as-is, this is a one-off admin endpoint
```

5. **Apply user decisions**: Fix items the user approves, mark others as wontfix.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve deferred path for large-scope findings

The workflow now instructs the model to mark non-approved ASK items as wontfix, but it no longer includes an execution step to create follow-up tasks for large-scope issues (which the philosophy section still calls out as a distinct outcome). This regression can cause valid large-scope findings to be dropped as wontfix instead of being tracked, despite the output format still advertising a Deferred category.

Useful? React with 👍 / 👎.

- Narrow missing-await AUTO-FIX: only when result is used or error must
  be caught; fire-and-forget patterns are ASK (behavior change, not style)
- Move duplicate-code dedup to ASK: abstraction design is a judgment call
- Add test-file guard: only import/formatting AUTO-FIX in test files
- Add DEFER as 3rd classification path for large-scope findings
- Add 'when in doubt, classify as ASK' rule
- Clarify agent name note in ASK format example
- Omit empty output sections rather than showing (0 issues)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Review: Add fix-first heuristic to multi-review

The fix-first workflow is a solid improvement — the 6-step structure is clear and the AUTO-FIX/ASK/DEFER classification is well-scoped. Many of the concerns in the previous review (cursor's comment) appear to have already been addressed in the current version (fire-and-forget caveat, test file guard, deduplication → ASK, "when in doubt" fallback). Adding a few observations not yet covered:


Inconsistency between <philosophy> and new workflow

<philosophy> still describes a v2-era decision model:

"fix now, wontfix (with reason), or large scope"

This conflicts with the new 3-category system (AUTO-FIX / ASK / DEFER). An LLM reading both sections could get confused about which model to follow. The philosophy section's language should be updated to match — or at minimum, note that the <execution> section is authoritative.

Missing severity → classification relationship

Step 1 (Collect) says "group by severity" but severity is never used as an input to the Classify step. A critical-severity finding in test assertion code still routes to ASK — fine, but the relationship (or lack thereof) between severity and classification should be stated. Otherwise the grouping in step 1 feels disconnected from the workflow.

ASK items block /autotask pipelines

When called from /autotask (no interactive user), the Ask step (step 4) would stall waiting for a response that never comes. The /ship and /autotask note at the end mentions concise output, but doesn't address what to do with ASK items in non-interactive contexts. Consider adding: "When called from /autotask or other automated contexts, treat ASK items as DEFER — create tasks for them rather than blocking on user input."

Agent attribution asymmetry in output format

The Auto-fixed section includes (agent that caught it) but User-approved fixes does not. Both sections should carry attribution for consistency and auditability — especially useful when reviewing a summary to understand which reviewers were most useful.

Minor: </execution> closing tag indentation

The </execution> tag has leading whitespace (looks like 3 spaces), which is inconsistent with other closing tags in the file. Cosmetic, but worth fixing for consistency.


What's working well

  • The cursor bot's highest-priority concerns (missing await, test file guard, deduplication) are cleanly addressed in the current version
  • Batching ASK items into a single question is the right call — minimizes interruptions
  • DEFER creating a follow-up task rather than just declining is a good practice
  • "Omit empty sections" in the report step prevents noise in clean runs

The philosophy/execution inconsistency and the autotask blocking concern are the two worth resolving before merge. The rest are polish.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

interpolation → RECOMMENDATION: parameterize, because user input flows here
2. [Agent: architecture-auditor] Service layer bypasses repository pattern
→ RECOMMENDATION: keep as-is, this is a one-off admin endpoint
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASK step blocks autonomous callers with no fallback

High Severity

The new ASK step (step 4) pauses for user input, but /autotask calls /multi-review as part of a fully autonomous "without supervision" workflow. The output-format section mentions /autotask callers only for concise output but gives no guidance on handling ASK items in autonomous contexts. The old workflow was fully autonomous (agent decided everything); the new one introduces a blocking interaction with no fallback, breaking the autonomous pipeline.

Additional Locations (1)
Fix in Cursor Fix in Web

fire-and-forget patterns — those require judgment and belong in ASK)
- Obvious null/undefined checks where crash is certain
- Wrong casing or naming convention violations
- Missing error propagation (empty catch blocks, swallowed errors)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty catch auto-fix contradicts low-risk behavior-change criteria

Medium Severity

"Missing error propagation (empty catch blocks, swallowed errors)" is classified as AUTO-FIX, but propagating previously-swallowed errors fundamentally changes program behavior — turning silent operations into potential crashes. This directly contradicts the AUTO-FIX criteria of "low risk of changing behavior in unexpected ways" and the ASK guidance to classify items with "risk of unintended behavior change" as ASK.

Additional Locations (1)
Fix in Cursor Fix in Web

1. **Collect**: Gather all findings, deduplicate across agents, group by severity, note
which agent caught each issue

2. **Classify** each finding as AUTO-FIX, ASK, or DEFER:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classification system has no path for agent-recognized false positives

Medium Severity

Step 2 requires classifying every finding as AUTO-FIX, ASK, or DEFER — but the old workflow's wontfix disposition (agent-recognized false positives) was dropped from classification. The output format still has a Wontfix section, and the <philosophy> block still lists "Wontfix: suggestion doesn't apply given full context" as a valid reason. The only way wontfix items arise now is via user-declined ASK items (step 5), so agent-identified false positives either get silently dropped or needlessly burden the user as ASK items.

Additional Locations (2)
Fix in Cursor Fix in Web

@TechNickAI TechNickAI merged commit 7251c74 into main Apr 1, 2026
6 checks passed
@TechNickAI TechNickAI deleted the feature/fix-first-multi-review branch April 1, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant