Skip to content

examples: add evaluation + optimization closed-loop pipeline#107

Open
han12580 wants to merge 1 commit into
trpc-group:mainfrom
han12580:example/eval-optimize-loop
Open

examples: add evaluation + optimization closed-loop pipeline#107
han12580 wants to merge 1 commit into
trpc-group:mainfrom
han12580:example/eval-optimize-loop

Conversation

@han12580

@han12580 han12580 commented Jul 1, 2026

Copy link
Copy Markdown

What

Add a reproducible Evaluation + Optimization pipeline under examples/optimization/eval_optimize_loop/ that wires AgentEvaluator and AgentOptimizer into a single auditable loop.

Stages: baseline evaluation → failure attribution → prompt optimization (three-field TargetPrompt: router / system / skill) → candidate validation with per-case delta → configurable acceptance gate → audit artifacts.

Highlights

  • Baseline evaluation records per-case metric score, pass/fail, failure reason and key trajectory (train + val).
  • Failure attribution clusters failures into six categories, each with an explainable reason.
  • Candidate validation re-runs the val set and classifies newly passed / newly failed / improved / regressed cases.
  • Acceptance gate is configurable (min val delta, no new hard fail, key-case no regression, cost budget) and rejects overfitting candidates that improve on train but regress on val.
  • Audit emits optimization_report.json + optimization_report.md, and persists per-round candidate prompts, cost, duration and the reproducibility config (seed / mode / dataset paths).
  • Offline fake backend runs the whole loop deterministically without an API key in a few seconds; a real backend drives a live multi-agent setup and the real GEPA optimizer when credentials are provided.
  • Six sample cases (3 train / 3 val) cover the optimizable-success, ineffective, and post-optimization-regression scenarios; a 300–500 word DESIGN.md documents the attribution method, acceptance strategy, anti-overfitting strategy and audit approach.

Fixes #91

RELEASE NOTES: Add an evaluation + optimization closed-loop example under examples/optimization/eval_optimize_loop.

Add a reproducible Evaluation + Optimization pipeline under
examples/optimization/eval_optimize_loop that wires AgentEvaluator and
AgentOptimizer into a single auditable loop: baseline evaluation, failure
attribution, prompt optimization over a three-field TargetPrompt, candidate
validation with per-case delta, a configurable acceptance gate, and audit
artifacts.

The pipeline records per-case metric scores, pass/fail, failure reasons and
key trajectory during baseline evaluation, clusters failures into six
categories, re-runs the validation set on the candidate to distinguish newly
passed, newly failed, improved and regressed cases, and rejects overfitting
candidates that improve on train but regress on validation. It emits
optimization_report.json and optimization_report.md, and persists per-round
candidate prompts, cost, duration and the reproducibility config.

A default offline fake backend runs the whole loop deterministically without
an API key in a few seconds, while a real backend drives a live multi-agent
setup and the real GEPA optimizer when model credentials are provided. Six
sample cases cover the optimizable, ineffective and regressing scenarios.

Fixes trpc-group#91

RELEASE NOTES: Add an evaluation + optimization closed-loop example under examples/optimization/eval_optimize_loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@han12580

han12580 commented Jul 1, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@73655ab). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             main        #107   +/-   ##
==========================================
  Coverage        ?   87.64107%           
==========================================
  Files           ?         433           
  Lines           ?       41557           
  Branches        ?           0           
==========================================
  Hits            ?       36421           
  Misses          ?        5136           
  Partials        ?           0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Rook1ex added a commit to trpc-group/cla-database that referenced this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

构建 Evaluation + Optimization 的自动回归与提示词优化闭环

1 participant