Add fix-first heuristic to multi-review#45
Conversation
Review findings are now auto-classified as AUTO-FIX (obvious, unambiguous) or ASK (needs user judgment). Auto-fixes are applied immediately; ASK items are batched into a single question with recommendations. This reduces round-trips and lets /ship and /autotask flow without unnecessary pauses. Inspired by gstack's fix-first review pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review: Add fix-first heuristic to multi-reviewOverall this is a well-designed improvement. The fix-first pattern meaningfully reduces round-trips and the classification criteria are thoughtfully chosen. A few things worth considering: Potential Issues
Fire-and-forget patterns are legitimate (background jobs, non-critical side effects), and adding Duplicate code as AUTO-FIX Deduplication requires choosing an abstraction (where does it live? what's the interface?), which is an architectural judgment call. Multiple agents flagging it independently increases confidence it's real duplication, but the fix still involves design decisions. This feels more like an ASK with a strong recommendation, not an AUTO-FIX. No explicit guard for test files AUTO-FIX criteria don't mention test files, but the ASK criteria do say "Anything that touches test assertions or expected values." An empty catch block inside a test helper, or an unused import in a test file, would qualify as AUTO-FIX by the current rules. Worth adding an explicit note like: "In test files, apply AUTO-FIX only for import/formatting issues; route logic changes to ASK." Minor ObservationsAmbiguous cases have no fallback: When a finding doesn't clearly fit either bucket, what's the default? Adding "When in doubt, classify as ASK" would prevent over-eager auto-fixing. ASK format example uses hardcoded agent names ( Output format What's Working Well
The 🤖 Generated with Claude Code |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 876ea1caf6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 4. **Ask**: If any ASK items exist, present them in a single batch with context and a | ||
| recommendation for each. Format: |
There was a problem hiding this comment.
Avoid pausing /autotask for review decisions
This step makes /multi-review stop and ask the user whenever an ASK item exists, but /autotask is explicitly defined as an autonomous workflow (“without supervision” and “pause only for deal-killers” in plugins/core/commands/autotask.md:10-12,177-181). Because /autotask invokes /multi-review in balanced/deep paths, this new prompt behavior can block unattended runs on routine design/performance trade-offs that are not deal-killers.
Useful? React with 👍 / 👎.
| → RECOMMENDATION: keep as-is, this is a one-off admin endpoint | ||
| ``` | ||
|
|
||
| 5. **Apply user decisions**: Fix items the user approves, mark others as wontfix. |
There was a problem hiding this comment.
Preserve deferred path for large-scope findings
The workflow now instructs the model to mark non-approved ASK items as wontfix, but it no longer includes an execution step to create follow-up tasks for large-scope issues (which the philosophy section still calls out as a distinct outcome). This regression can cause valid large-scope findings to be dropped as wontfix instead of being tracked, despite the output format still advertising a Deferred category.
Useful? React with 👍 / 👎.
- Narrow missing-await AUTO-FIX: only when result is used or error must be caught; fire-and-forget patterns are ASK (behavior change, not style) - Move duplicate-code dedup to ASK: abstraction design is a judgment call - Add test-file guard: only import/formatting AUTO-FIX in test files - Add DEFER as 3rd classification path for large-scope findings - Add 'when in doubt, classify as ASK' rule - Clarify agent name note in ASK format example - Omit empty output sections rather than showing (0 issues) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review: Add fix-first heuristic to multi-reviewThe fix-first workflow is a solid improvement — the 6-step structure is clear and the AUTO-FIX/ASK/DEFER classification is well-scoped. Many of the concerns in the previous review (cursor's comment) appear to have already been addressed in the current version (fire-and-forget caveat, test file guard, deduplication → ASK, "when in doubt" fallback). Adding a few observations not yet covered: Inconsistency between
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| interpolation → RECOMMENDATION: parameterize, because user input flows here | ||
| 2. [Agent: architecture-auditor] Service layer bypasses repository pattern | ||
| → RECOMMENDATION: keep as-is, this is a one-off admin endpoint | ||
| ``` |
There was a problem hiding this comment.
ASK step blocks autonomous callers with no fallback
High Severity
The new ASK step (step 4) pauses for user input, but /autotask calls /multi-review as part of a fully autonomous "without supervision" workflow. The output-format section mentions /autotask callers only for concise output but gives no guidance on handling ASK items in autonomous contexts. The old workflow was fully autonomous (agent decided everything); the new one introduces a blocking interaction with no fallback, breaking the autonomous pipeline.
Additional Locations (1)
| fire-and-forget patterns — those require judgment and belong in ASK) | ||
| - Obvious null/undefined checks where crash is certain | ||
| - Wrong casing or naming convention violations | ||
| - Missing error propagation (empty catch blocks, swallowed errors) |
There was a problem hiding this comment.
Empty catch auto-fix contradicts low-risk behavior-change criteria
Medium Severity
"Missing error propagation (empty catch blocks, swallowed errors)" is classified as AUTO-FIX, but propagating previously-swallowed errors fundamentally changes program behavior — turning silent operations into potential crashes. This directly contradicts the AUTO-FIX criteria of "low risk of changing behavior in unexpected ways" and the ASK guidance to classify items with "risk of unintended behavior change" as ASK.
Additional Locations (1)
| 1. **Collect**: Gather all findings, deduplicate across agents, group by severity, note | ||
| which agent caught each issue | ||
|
|
||
| 2. **Classify** each finding as AUTO-FIX, ASK, or DEFER: |
There was a problem hiding this comment.
Classification system has no path for agent-recognized false positives
Medium Severity
Step 2 requires classifying every finding as AUTO-FIX, ASK, or DEFER — but the old workflow's wontfix disposition (agent-recognized false positives) was dropped from classification. The output format still has a Wontfix section, and the <philosophy> block still lists "Wontfix: suggestion doesn't apply given full context" as a valid reason. The only way wontfix items arise now is via user-declined ASK items (step 5), so agent-identified false positives either get silently dropped or needlessly burden the user as ASK items.


Summary
Changes
plugins/core/commands/multi-review.md— v2.2.0 → v3.0.0Test plan
/multi-reviewon a branch with known issues — verify AUTO-FIX items are fixed without asking/multi-reviewfrom/autotask— verify concise output🤖 Generated with Claude Code