Plugin name
agent-council
Short description
A runtime-portable 5-agent quality gate that adjudicates text artifacts before they ship. Five role-conditioned LLM deliberators run a 2-round async protocol with cross-read rebuttal and return one verdict — SHIP, REVISE, or HOLD — plus a structured revision brief and a full audit transcript.
GitHub repository
Avyayalaya/agent-council
Plugin path inside the repository
No response
Ref to review
v0.1.0
Commit SHA to review
No response
Version
0.1.0
License identifier
MIT
Author name
Parth Sangani
Author URL
https://github.com/Avyayalaya
Homepage URL
No response
Keywords
quality-gate
multi-agent
adjudication
council
llm-as-judge
agent-orchestration
review
claude-code
mcp
Additional notes for reviewers
What's included
- 5 deliberator prompts at
prompts/ — Skeptic, Voice & Identity, Evidence & Calibration, Strategy & Stakes, Adjudicator. Each is self-contained and explains what the deliberator should produce.
- MCP server at
mcp/agent_council_mcp_server.py — tool council_review(artifact_path, tier=1). council_sweep and council_audit tools land in v0.1.2.
- 4 runtime adapters under
src/agent_council/runtimes/ — claude_cli, lmstudio, ollama, mock_cli. The Council shells out to whatever LLM CLI is configured; no SDK, no API keys, no vendor lock-in.
- 2 slash commands at
commands/ — /council-review and /council-sweep for Claude Code.
- Verdict log writes to
council_log.jsonl (append-only) with full deliberation transcript for audit.
What the Council adds over a single LLM-as-judge
The Council encodes a specific methodology: 2-round protocol (R1 independent critique, R2 cross-read rebuttal) + verdict-policy compilation (3+ block on irreducible → HOLD; 3+ block on reducible + Adjudicator reasons downgrade → REVISE; otherwise → SHIP). Each deliberator role applies a different evaluation frame — Skeptic (steelman), Voice & Identity, Evidence & Calibration, Strategy & Stakes — and the Adjudicator synthesizes deliberator verdicts; it does not tally a majority. The structural difference vs single-LLM-as-judge is the 5-perspective evaluation frame + cross-read rebuttal + deterministic verdict policy (policy is deterministic; deliberator inputs are not, per the Adjudicator non-determinism limitation in the scope section below). Measured comparison to single-LLM-as-judge baseline lands in v0.2 alongside the 3-arm benchmark.
Technical properties
- Modularity invariant is CI-tested.
test_modularity_invariant.py enforces zero coupling between the Council and any producing agent. The council can be removed and producing agents keep working unchanged.
- Runtime portability. 4 LLM CLI adapters shipped under
src/agent_council/runtimes/; ported to additional runtimes by adding one file. CI-verified for mock_cli; claude_cli, lmstudio, and ollama adapters manually exercised in local development.
- 105 unit tests cover orchestrator, schema, verdict merge, prior-verdict compounding, runtime adapters. CI green at v0.1.0 tag — verifiable run #26144855968 (test.yml, head SHA
0aed51f, which is the v0.1.0 release commit).
Scope of v0.1.0
v0.1.0 ships architecture and design only. Empirical evaluation (3-arm benchmark with confidence intervals + arXiv paper) lands in v0.2 per the CHANGELOG and Roadmap. Known v0.1.0 limitations documented openly:
- Same-model self-style risk (when all deliberators run on the same base model).
- Adjudicator non-determinism (R2 verdict synthesis is sensitive to deliberator output ordering).
- LM Studio HTTP 500 on large parallel prompts (workaround: lower parallelism in
council.lmstudio.yaml).
Reviewers evaluating this submission should treat the architectural + modularity claims as testable today, and the performance-vs-single-LLM-as-judge claim as deferred to v0.2 measurement.
Path note
apm.yml declares source: ./ (root). The .claude-plugin/marketplace.json is regenerated from apm.yml via apm pack. The Plugin path field above is left blank per the form's "optional when at root" rule — same pattern as my prior submission pm-skills (issue #1767).
Cross-reference to existing index entry
agent-council v0.1.0 was added to my own curated index Avyayalaya/awesome-apm v0.2.0 on 2026-05-20 as the "first quality-gate domain on the index" (existing domains are execution / analysis layers). The entry is declared in apm.yml and mirrored to .claude-plugin/marketplace.json.
Submission compliance
Public GitHub repo (gh repo view Avyayalaya/agent-council), MIT license, immutable v0.1.0 release tag pushed 2026-05-20 before this submission, no paid services, no RAI violations. Same author and authoring identity (Avyayalaya / Parth Sangani) as the previously-approved pm-skills submission (#1767 → #1770).
Submission checklist
Plugin name
agent-council
Short description
A runtime-portable 5-agent quality gate that adjudicates text artifacts before they ship. Five role-conditioned LLM deliberators run a 2-round async protocol with cross-read rebuttal and return one verdict — SHIP, REVISE, or HOLD — plus a structured revision brief and a full audit transcript.
GitHub repository
Avyayalaya/agent-council
Plugin path inside the repository
No response
Ref to review
v0.1.0
Commit SHA to review
No response
Version
0.1.0
License identifier
MIT
Author name
Parth Sangani
Author URL
https://github.com/Avyayalaya
Homepage URL
No response
Keywords
quality-gate
multi-agent
adjudication
council
llm-as-judge
agent-orchestration
review
claude-code
mcp
Additional notes for reviewers
What's included
prompts/— Skeptic, Voice & Identity, Evidence & Calibration, Strategy & Stakes, Adjudicator. Each is self-contained and explains what the deliberator should produce.mcp/agent_council_mcp_server.py— toolcouncil_review(artifact_path, tier=1).council_sweepandcouncil_audittools land in v0.1.2.src/agent_council/runtimes/—claude_cli,lmstudio,ollama,mock_cli. The Council shells out to whatever LLM CLI is configured; no SDK, no API keys, no vendor lock-in.commands/—/council-reviewand/council-sweepfor Claude Code.council_log.jsonl(append-only) with full deliberation transcript for audit.What the Council adds over a single LLM-as-judge
The Council encodes a specific methodology: 2-round protocol (R1 independent critique, R2 cross-read rebuttal) + verdict-policy compilation (
3+ block on irreducible → HOLD; 3+ block on reducible + Adjudicator reasons downgrade → REVISE; otherwise → SHIP). Each deliberator role applies a different evaluation frame — Skeptic (steelman), Voice & Identity, Evidence & Calibration, Strategy & Stakes — and the Adjudicator synthesizes deliberator verdicts; it does not tally a majority. The structural difference vs single-LLM-as-judge is the 5-perspective evaluation frame + cross-read rebuttal + deterministic verdict policy (policy is deterministic; deliberator inputs are not, per the Adjudicator non-determinism limitation in the scope section below). Measured comparison to single-LLM-as-judge baseline lands in v0.2 alongside the 3-arm benchmark.Technical properties
test_modularity_invariant.pyenforces zero coupling between the Council and any producing agent. The council can be removed and producing agents keep working unchanged.src/agent_council/runtimes/; ported to additional runtimes by adding one file. CI-verified formock_cli;claude_cli,lmstudio, andollamaadapters manually exercised in local development.0aed51f, which is the v0.1.0 release commit).Scope of v0.1.0
v0.1.0 ships architecture and design only. Empirical evaluation (3-arm benchmark with confidence intervals + arXiv paper) lands in v0.2 per the CHANGELOG and Roadmap. Known v0.1.0 limitations documented openly:
council.lmstudio.yaml).Reviewers evaluating this submission should treat the architectural + modularity claims as testable today, and the performance-vs-single-LLM-as-judge claim as deferred to v0.2 measurement.
Path note
apm.ymldeclaressource: ./(root). The.claude-plugin/marketplace.jsonis regenerated fromapm.ymlviaapm pack. ThePlugin pathfield above is left blank per the form's "optional when at root" rule — same pattern as my prior submissionpm-skills(issue #1767).Cross-reference to existing index entry
agent-councilv0.1.0 was added to my own curated indexAvyayalaya/awesome-apmv0.2.0 on 2026-05-20 as the "first quality-gate domain on the index" (existing domains are execution / analysis layers). The entry is declared inapm.ymland mirrored to.claude-plugin/marketplace.json.Submission compliance
Public GitHub repo (
gh repo view Avyayalaya/agent-council), MIT license, immutablev0.1.0release tag pushed 2026-05-20 before this submission, no paid services, no RAI violations. Same author and authoring identity (Avyayalaya / Parth Sangani) as the previously-approvedpm-skillssubmission (#1767 → #1770).Submission checklist