Skip to content

[External Plugin]: agent-council #1850

@Avyayalaya

Description

@Avyayalaya

Plugin name

agent-council

Short description

A runtime-portable 5-agent quality gate that adjudicates text artifacts before they ship. Five role-conditioned LLM deliberators run a 2-round async protocol with cross-read rebuttal and return one verdict — SHIP, REVISE, or HOLD — plus a structured revision brief and a full audit transcript.

GitHub repository

Avyayalaya/agent-council

Plugin path inside the repository

No response

Ref to review

v0.1.0

Commit SHA to review

No response

Version

0.1.0

License identifier

MIT

Author name

Parth Sangani

Author URL

https://github.com/Avyayalaya

Homepage URL

No response

Keywords

quality-gate
multi-agent
adjudication
council
llm-as-judge
agent-orchestration
review
claude-code
mcp

Additional notes for reviewers

What's included

  • 5 deliberator prompts at prompts/ — Skeptic, Voice & Identity, Evidence & Calibration, Strategy & Stakes, Adjudicator. Each is self-contained and explains what the deliberator should produce.
  • MCP server at mcp/agent_council_mcp_server.py — tool council_review(artifact_path, tier=1). council_sweep and council_audit tools land in v0.1.2.
  • 4 runtime adapters under src/agent_council/runtimes/claude_cli, lmstudio, ollama, mock_cli. The Council shells out to whatever LLM CLI is configured; no SDK, no API keys, no vendor lock-in.
  • 2 slash commands at commands//council-review and /council-sweep for Claude Code.
  • Verdict log writes to council_log.jsonl (append-only) with full deliberation transcript for audit.

What the Council adds over a single LLM-as-judge

The Council encodes a specific methodology: 2-round protocol (R1 independent critique, R2 cross-read rebuttal) + verdict-policy compilation (3+ block on irreducible → HOLD; 3+ block on reducible + Adjudicator reasons downgrade → REVISE; otherwise → SHIP). Each deliberator role applies a different evaluation frame — Skeptic (steelman), Voice & Identity, Evidence & Calibration, Strategy & Stakes — and the Adjudicator synthesizes deliberator verdicts; it does not tally a majority. The structural difference vs single-LLM-as-judge is the 5-perspective evaluation frame + cross-read rebuttal + deterministic verdict policy (policy is deterministic; deliberator inputs are not, per the Adjudicator non-determinism limitation in the scope section below). Measured comparison to single-LLM-as-judge baseline lands in v0.2 alongside the 3-arm benchmark.

Technical properties

  • Modularity invariant is CI-tested. test_modularity_invariant.py enforces zero coupling between the Council and any producing agent. The council can be removed and producing agents keep working unchanged.
  • Runtime portability. 4 LLM CLI adapters shipped under src/agent_council/runtimes/; ported to additional runtimes by adding one file. CI-verified for mock_cli; claude_cli, lmstudio, and ollama adapters manually exercised in local development.
  • 105 unit tests cover orchestrator, schema, verdict merge, prior-verdict compounding, runtime adapters. CI green at v0.1.0 tag — verifiable run #26144855968 (test.yml, head SHA 0aed51f, which is the v0.1.0 release commit).

Scope of v0.1.0

v0.1.0 ships architecture and design only. Empirical evaluation (3-arm benchmark with confidence intervals + arXiv paper) lands in v0.2 per the CHANGELOG and Roadmap. Known v0.1.0 limitations documented openly:

  • Same-model self-style risk (when all deliberators run on the same base model).
  • Adjudicator non-determinism (R2 verdict synthesis is sensitive to deliberator output ordering).
  • LM Studio HTTP 500 on large parallel prompts (workaround: lower parallelism in council.lmstudio.yaml).

Reviewers evaluating this submission should treat the architectural + modularity claims as testable today, and the performance-vs-single-LLM-as-judge claim as deferred to v0.2 measurement.

Path note

apm.yml declares source: ./ (root). The .claude-plugin/marketplace.json is regenerated from apm.yml via apm pack. The Plugin path field above is left blank per the form's "optional when at root" rule — same pattern as my prior submission pm-skills (issue #1767).

Cross-reference to existing index entry

agent-council v0.1.0 was added to my own curated index Avyayalaya/awesome-apm v0.2.0 on 2026-05-20 as the "first quality-gate domain on the index" (existing domains are execution / analysis layers). The entry is declared in apm.yml and mirrored to .claude-plugin/marketplace.json.

Submission compliance

Public GitHub repo (gh repo view Avyayalaya/agent-council), MIT license, immutable v0.1.0 release tag pushed 2026-05-20 before this submission, no paid services, no RAI violations. Same author and authoring identity (Avyayalaya / Parth Sangani) as the previously-approved pm-skills submission (#1767#1770).

Submission checklist

  • The plugin lives in a public GitHub repository.
  • The ref and/or sha I provided is immutable (release tag and/or full 40-character commit SHA), not a branch.
  • This submission follows this repository's contribution, security, and responsible AI policies.
  • This plugin is not already listed in the Awesome Copilot marketplace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    external-pluginPR updates plugins/external.jsonneeds-review:MEDIUMContributor reputation check flagged MEDIUM riskready-for-reviewSubmission passed intake validation and is ready for maintainer review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions