Releases: EntityProcess/agentv
Releases · EntityProcess/agentv
v4.20.0
What's Changed
- feat(cli): add --budget-usd run-level cost cap by @christso in #1118
- feat(bench): autoresearch optimization loop (#958, #746, #748) by @christso in #1112
- refactor(bench): extract autoresearch to reference file by @christso in #1124
- feat(core): expose {{ tool_calls }} template variable for LLM graders by @christso in #1123
Full Changelog: v4.19.0...v4.20.0
v4.20.0-next.1
What's Changed
- feat(cli): add --budget-usd run-level cost cap by @christso in #1118
- feat(bench): autoresearch optimization loop (#958, #746, #748) by @christso in #1112
- refactor(bench): extract autoresearch to reference file by @christso in #1124
- feat(core): expose {{ tool_calls }} template variable for LLM graders by @christso in #1123
Full Changelog: v4.19.0...v4.20.0-next.1
v4.19.0
What's Changed
- refactor(core): rename Evaluator to Grader across codebase by @christso in #1111
- feat(cli): incremental eval runs — resume, append, and aggregate by @christso in #1110
- feat(core): rename total_budget_usd to budget_usd by @christso in #1117
- feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API by @christso in #1119
- feat(cli): add *.eval.ts auto-discovery by @christso in #1120
Full Changelog: v4.17.1...v4.19.0
v4.19.0-next.1
What's Changed
- feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API by @christso in #1119
- feat(cli): add *.eval.ts auto-discovery by @christso in #1120
Full Changelog: v4.18.0-next.1...v4.19.0-next.1
v4.18.0-next.1
v4.17.1
What's Changed
- feat(pipeline): add --target and --targets flags to pipeline run and pipeline input by @christso in #1108
Full Changelog: v4.17.0...v4.17.1
v4.17.1-next.1
What's Changed
- feat(pipeline): add --target and --targets flags to pipeline run and pipeline input by @christso in #1108
Full Changelog: v4.17.0...v4.17.1-next.1
v4.17.0
What's Changed
- feat(compare): add normalized gain metric by @christso in #1101
- docs(agents): add self-describing rules for headers and test contracts by @christso in #1103
- feat(studio): comparison analytics charts for skills/workflow benchmarking by @christso in #1104
- docs: rename evaluators to graders by @christso in #1106
- feat(cli): add results report subcommand by @christso in #1105
Full Changelog: v4.16.0...v4.17.0
v4.17.0-next.1
What's Changed
- feat(compare): add normalized gain metric by @christso in #1101
- docs(agents): add self-describing rules for headers and test contracts by @christso in #1103
- feat(studio): comparison analytics charts for skills/workflow benchmarking by @christso in #1104
- docs: rename evaluators to graders by @christso in #1106
- feat(cli): add results report subcommand by @christso in #1105
Full Changelog: v4.16.0...v4.17.0-next.1
v4.16.0
What's Changed
- feat(providers): add executable field to claude-cli provider by @christso in #1092
- feat(providers): add cc-mirror provider alias by @christso in #1093
- feat(targets): remove workspace_template, add target-level hooks by @christso in #1095
- fix(paths): separate config dir from AGENTV_HOME data dir by @christso in #1096
- docs(showcase): add bug-fix-benchmark example for SWE-bench style evaluation by @christso in #1091
- docs: prefer rubrics over llm-grader with inline prompt by @christso in #1097
- fix: auto-weight grouped rubrics shorthand by criteria count by @christso in #1099
Full Changelog: v4.15.9...v4.16.0