examples: add evaluation + optimization closed-loop pipeline by han12580 · Pull Request #107 · trpc-group/trpc-agent-python

han12580 · 2026-07-01T12:23:43Z

What

Add a reproducible Evaluation + Optimization pipeline under examples/optimization/eval_optimize_loop/ that wires AgentEvaluator and AgentOptimizer into a single auditable loop.

Stages: baseline evaluation → failure attribution → prompt optimization (three-field TargetPrompt: router / system / skill) → candidate validation with per-case delta → configurable acceptance gate → audit artifacts.

Highlights

Baseline evaluation records per-case metric score, pass/fail, failure reason and key trajectory (train + val).
Failure attribution clusters failures into six categories, each with an explainable reason.
Candidate validation re-runs the val set and classifies newly passed / newly failed / improved / regressed cases.
Acceptance gate is configurable (min val delta, no new hard fail, key-case no regression, cost budget) and rejects overfitting candidates that improve on train but regress on val.
Audit emits optimization_report.json + optimization_report.md, and persists per-round candidate prompts, cost, duration and the reproducibility config (seed / mode / dataset paths).
Offline fake backend runs the whole loop deterministically without an API key in a few seconds; a real backend drives a live multi-agent setup and the real GEPA optimizer when credentials are provided.
Six sample cases (3 train / 3 val) cover the optimizable-success, ineffective, and post-optimization-regression scenarios; a 300–500 word DESIGN.md documents the attribution method, acceptance strategy, anti-overfitting strategy and audit approach.

Fixes #91

RELEASE NOTES: Add an evaluation + optimization closed-loop example under examples/optimization/eval_optimize_loop.

Add a reproducible Evaluation + Optimization pipeline under examples/optimization/eval_optimize_loop that wires AgentEvaluator and AgentOptimizer into a single auditable loop: baseline evaluation, failure attribution, prompt optimization over a three-field TargetPrompt, candidate validation with per-case delta, a configurable acceptance gate, and audit artifacts. The pipeline records per-case metric scores, pass/fail, failure reasons and key trajectory during baseline evaluation, clusters failures into six categories, re-runs the validation set on the candidate to distinguish newly passed, newly failed, improved and regressed cases, and rejects overfitting candidates that improve on train but regress on validation. It emits optimization_report.json and optimization_report.md, and persists per-round candidate prompts, cost, duration and the reproducibility config. A default offline fake backend runs the whole loop deterministically without an API key in a few seconds, while a real backend drives a live multi-agent setup and the real GEPA optimizer when model credentials are provided. Six sample cases cover the optimizable, ineffective and regressing scenarios. Fixes trpc-group#91 RELEASE NOTES: Add an evaluation + optimization closed-loop example under examples/optimization/eval_optimize_loop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-07-01T12:24:08Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

han12580 · 2026-07-01T12:26:34Z

I have read the CLA Document and I hereby sign the CLA

codecov · 2026-07-01T12:26:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@73655ab). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             main        #107   +/-   ##
==========================================
  Coverage        ?   87.64107%           
==========================================
  Files           ?         433           
  Lines           ?       41557           
  Branches        ?           0           
==========================================
  Hits            ?       36421           
  Misses          ?        5136           
  Partials        ?           0

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Rook1ex added a commit to trpc-group/cla-database that referenced this pull request Jul 1, 2026

@han12580 has signed the CLA in trpc-group/trpc-agent-python#107

e8f34ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples: add evaluation + optimization closed-loop pipeline#107

examples: add evaluation + optimization closed-loop pipeline#107
han12580 wants to merge 1 commit into
trpc-group:mainfrom
han12580:example/eval-optimize-loop

han12580 commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

han12580 commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

han12580 commented Jul 1, 2026

What

Highlights

Uh oh!

github-actions Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

han12580 commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jul 1, 2026 •

edited

Loading