Add fix-first heuristic to multi-review by TechNickAI · Pull Request #45 · TechNickAI/ai-coding-config

TechNickAI · 2026-04-01T14:14:55Z

Summary

Review findings now auto-classified as AUTO-FIX (obvious, unambiguous) or ASK (needs judgment)
Auto-fixes applied immediately without asking; ASK items batched into single question with recommendations
Inspired by gstack's fix-first review pattern — reduces round-trips and lets /ship flow without unnecessary pauses

Changes

plugins/core/commands/multi-review.md — v2.2.0 → v3.0.0
- New 6-step fix-first synthesis workflow (Collect → Classify → Fix → Ask → Apply → Report)
- Explicit AUTO-FIX criteria (unused imports, missing await, empty catches, etc.)
- Explicit ASK criteria (architectural, security-sensitive, behavior changes)
- Updated output format: auto-fixed vs user-approved vs wontfix vs deferred

Test plan

Run /multi-review on a branch with known issues — verify AUTO-FIX items are fixed without asking
Verify ASK items are batched into single question
Run /multi-review from /autotask — verify concise output

🤖 Generated with Claude Code

Review findings are now auto-classified as AUTO-FIX (obvious, unambiguous) or ASK (needs user judgment). Auto-fixes are applied immediately; ASK items are batched into a single question with recommendations. This reduces round-trips and lets /ship and /autotask flow without unnecessary pauses. Inspired by gstack's fix-first review pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-04-01T14:16:34Z

Code Review: Add fix-first heuristic to multi-review

Overall this is a well-designed improvement. The fix-first pattern meaningfully reduces round-trips and the classification criteria are thoughtfully chosen. A few things worth considering:

Potential Issues

Missing await as AUTO-FIX is risky

Fire-and-forget patterns are legitimate (background jobs, non-critical side effects), and adding await changes runtime behavior — it's not purely stylistic. A missing await that causes a crash is probably obvious from context, but the blanket rule could cause unintended behavior changes. Consider narrowing to: "Missing await where the result is used or error must be caught" — or moving to ASK.

Duplicate code as AUTO-FIX

Deduplication requires choosing an abstraction (where does it live? what's the interface?), which is an architectural judgment call. Multiple agents flagging it independently increases confidence it's real duplication, but the fix still involves design decisions. This feels more like an ASK with a strong recommendation, not an AUTO-FIX.

No explicit guard for test files

AUTO-FIX criteria don't mention test files, but the ASK criteria do say "Anything that touches test assertions or expected values." An empty catch block inside a test helper, or an unused import in a test file, would qualify as AUTO-FIX by the current rules. Worth adding an explicit note like: "In test files, apply AUTO-FIX only for import/formatting issues; route logic changes to ASK."

Minor Observations

Ambiguous cases have no fallback: When a finding doesn't clearly fit either bucket, what's the default? Adding "When in doubt, classify as ASK" would prevent over-eager auto-fixing.

ASK format example uses hardcoded agent names (security-reviewer, architecture-auditor). These may not exist in all projects. A note like "use the actual agent name/type that caught it" would clarify intent.

Output format (only if ASK items were presented) works but is slightly awkward when all items were auto-fixed — the User-approved fixes (0 issues) section would just be absent. Consider an explicit note that this section is omitted when not applicable, so LLMs following the format don't generate an empty section.

What's Working Well

The 6-step Collect → Classify → Fix → Ask → Apply → Report structure is clear and easy to follow
AUTO-FIX criteria are well-scoped for truly low-risk items
Batching ASK items into a single question rather than N interruptions is the right UX
The /ship and /autotask concise output note is a good ergonomic touch
v3.0.0 version bump is appropriate — this is a meaningful behavior change, not a tweak

The missing await and duplicate code AUTO-FIX items are the highest-signal concerns — both can produce unexpected behavior changes without explicit user approval. The rest are polish suggestions.

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 876ea1caf6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T14:18:27Z

plugins/core/commands/multi-review.md

+4. **Ask**: If any ASK items exist, present them in a single batch with context and a
+   recommendation for each. Format:


Avoid pausing /autotask for review decisions

This step makes /multi-review stop and ask the user whenever an ASK item exists, but /autotask is explicitly defined as an autonomous workflow (“without supervision” and “pause only for deal-killers” in plugins/core/commands/autotask.md:10-12,177-181). Because /autotask invokes /multi-review in balanced/deep paths, this new prompt behavior can block unattended runs on routine design/performance trade-offs that are not deal-killers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-01T14:18:27Z

plugins/core/commands/multi-review.md

+      → RECOMMENDATION: keep as-is, this is a one-off admin endpoint
+   ```
+
+5. **Apply user decisions**: Fix items the user approves, mark others as wontfix.


Preserve deferred path for large-scope findings

The workflow now instructs the model to mark non-approved ASK items as wontfix, but it no longer includes an execution step to create follow-up tasks for large-scope issues (which the philosophy section still calls out as a distinct outcome). This regression can cause valid large-scope findings to be dropped as wontfix instead of being tracked, despite the output format still advertising a Deferred category.

Useful? React with 👍 / 👎.

plugins/core/commands/multi-review.md

- Narrow missing-await AUTO-FIX: only when result is used or error must be caught; fire-and-forget patterns are ASK (behavior change, not style) - Move duplicate-code dedup to ASK: abstraction design is a judgment call - Add test-file guard: only import/formatting AUTO-FIX in test files - Add DEFER as 3rd classification path for large-scope findings - Add 'when in doubt, classify as ASK' rule - Clarify agent name note in ASK format example - Omit empty output sections rather than showing (0 issues) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-04-01T14:32:46Z

Review: Add fix-first heuristic to multi-review

The fix-first workflow is a solid improvement — the 6-step structure is clear and the AUTO-FIX/ASK/DEFER classification is well-scoped. Many of the concerns in the previous review (cursor's comment) appear to have already been addressed in the current version (fire-and-forget caveat, test file guard, deduplication → ASK, "when in doubt" fallback). Adding a few observations not yet covered:

Inconsistency between `<philosophy>` and new workflow

<philosophy> still describes a v2-era decision model:

"fix now, wontfix (with reason), or large scope"

This conflicts with the new 3-category system (AUTO-FIX / ASK / DEFER). An LLM reading both sections could get confused about which model to follow. The philosophy section's language should be updated to match — or at minimum, note that the <execution> section is authoritative.

Missing severity → classification relationship

Step 1 (Collect) says "group by severity" but severity is never used as an input to the Classify step. A critical-severity finding in test assertion code still routes to ASK — fine, but the relationship (or lack thereof) between severity and classification should be stated. Otherwise the grouping in step 1 feels disconnected from the workflow.

ASK items block `/autotask` pipelines

When called from /autotask (no interactive user), the Ask step (step 4) would stall waiting for a response that never comes. The /ship and /autotask note at the end mentions concise output, but doesn't address what to do with ASK items in non-interactive contexts. Consider adding: "When called from /autotask or other automated contexts, treat ASK items as DEFER — create tasks for them rather than blocking on user input."

Agent attribution asymmetry in output format

The Auto-fixed section includes (agent that caught it) but User-approved fixes does not. Both sections should carry attribution for consistency and auditability — especially useful when reviewing a summary to understand which reviewers were most useful.

Minor: `</execution>` closing tag indentation

The </execution> tag has leading whitespace (looks like 3 spaces), which is inconsistent with other closing tags in the file. Cosmetic, but worth fixing for consistency.

What's working well

The cursor bot's highest-priority concerns (missing await, test file guard, deduplication) are cleanly addressed in the current version
Batching ASK items into a single question is the right call — minimizes interruptions
DEFER creating a follow-up task rather than just declining is a good practice
"Omit empty sections" in the report step prevents noise in clean runs

The philosophy/execution inconsistency and the autotask blocking concern are the two worth resolving before merge. The rest are polish.

🤖 Generated with Claude Code

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T14:34:46Z

plugins/core/commands/multi-review.md

+      interpolation → RECOMMENDATION: parameterize, because user input flows here
+   2. [Agent: architecture-auditor] Service layer bypasses repository pattern
+      → RECOMMENDATION: keep as-is, this is a one-off admin endpoint
+   ```


ASK step blocks autonomous callers with no fallback

High Severity

The new ASK step (step 4) pauses for user input, but /autotask calls /multi-review as part of a fully autonomous "without supervision" workflow. The output-format section mentions /autotask callers only for concise output but gives no guidance on handling ASK items in autonomous contexts. The old workflow was fully autonomous (agent decided everything); the new one introduces a blocking interaction with no fallback, breaking the autonomous pipeline.

Additional Locations (1)

plugins/core/commands/multi-review.md#L166-L168

cursor · 2026-04-01T14:34:46Z

plugins/core/commands/multi-review.md

+     fire-and-forget patterns — those require judgment and belong in ASK)
+   - Obvious null/undefined checks where crash is certain
+   - Wrong casing or naming convention violations
+   - Missing error propagation (empty catch blocks, swallowed errors)


Empty catch auto-fix contradicts low-risk behavior-change criteria

Medium Severity

"Missing error propagation (empty catch blocks, swallowed errors)" is classified as AUTO-FIX, but propagating previously-swallowed errors fundamentally changes program behavior — turning silent operations into potential crashes. This directly contradicts the AUTO-FIX criteria of "low risk of changing behavior in unexpected ways" and the ASK guidance to classify items with "risk of unintended behavior change" as ASK.

Additional Locations (1)

plugins/core/commands/multi-review.md#L89-L91

cursor · 2026-04-01T14:34:46Z

plugins/core/commands/multi-review.md

+1. **Collect**: Gather all findings, deduplicate across agents, group by severity, note
+   which agent caught each issue
+
+2. **Classify** each finding as AUTO-FIX, ASK, or DEFER:


Classification system has no path for agent-recognized false positives

Medium Severity

Step 2 requires classifying every finding as AUTO-FIX, ASK, or DEFER — but the old workflow's wontfix disposition (agent-recognized false positives) was dropped from classification. The output format still has a Wontfix section, and the <philosophy> block still lists "Wontfix: suggestion doesn't apply given full context" as a valid reason. The only way wontfix items arise now is via user-declined ASK items (step 5), so agent-identified false positives either get silently dropped or needlessly burden the user as ASK items.

Additional Locations (2)

plugins/core/commands/multi-review.md#L133-L134

plugins/core/commands/multi-review.md#L158-L161

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

cursor bot reviewed Apr 1, 2026

View reviewed changes

plugins/core/commands/multi-review.md Outdated Show resolved Hide resolved

plugins/core/commands/multi-review.md Show resolved Hide resolved

cursor bot reviewed Apr 1, 2026

View reviewed changes

TechNickAI merged commit 7251c74 into main Apr 1, 2026
6 checks passed

TechNickAI deleted the feature/fix-first-multi-review branch April 1, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fix-first heuristic to multi-review#45

Add fix-first heuristic to multi-review#45
TechNickAI merged 2 commits intomainfrom
feature/fix-first-multi-review

TechNickAI commented Apr 1, 2026

Uh oh!

claude bot commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

claude bot commented Apr 1, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		4. Ask: If any ASK items exist, present them in a single batch with context and a
		recommendation for each. Format:

Conversation

TechNickAI commented Apr 1, 2026

Summary

Changes

Test plan

Uh oh!

claude bot commented Apr 1, 2026

Code Review: Add fix-first heuristic to multi-review

Potential Issues

Minor Observations

What's Working Well

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude bot commented Apr 1, 2026

Review: Add fix-first heuristic to multi-review

Inconsistency between <philosophy> and new workflow

Missing severity → classification relationship

ASK items block /autotask pipelines

Agent attribution asymmetry in output format

Minor: </execution> closing tag indentation

What's working well

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

ASK step blocks autonomous callers with no fallback

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Empty catch auto-fix contradicts low-risk behavior-change criteria

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Classification system has no path for agent-recognized false positives

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Inconsistency between `<philosophy>` and new workflow

ASK items block `/autotask` pipelines

Minor: `</execution>` closing tag indentation