Skip to content

Releases: EntityProcess/agentv

v4.20.0

16 Apr 06:07

Choose a tag to compare

What's Changed

Full Changelog: v4.19.0...v4.20.0

v4.20.0-next.1

16 Apr 06:02

Choose a tag to compare

v4.20.0-next.1 Pre-release
Pre-release

What's Changed

Full Changelog: v4.19.0...v4.20.0-next.1

v4.19.0

16 Apr 03:56

Choose a tag to compare

What's Changed

  • refactor(core): rename Evaluator to Grader across codebase by @christso in #1111
  • feat(cli): incremental eval runs — resume, append, and aggregate by @christso in #1110
  • feat(core): rename total_budget_usd to budget_usd by @christso in #1117
  • feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API by @christso in #1119
  • feat(cli): add *.eval.ts auto-discovery by @christso in #1120

Full Changelog: v4.17.1...v4.19.0

v4.19.0-next.1

16 Apr 03:55

Choose a tag to compare

v4.19.0-next.1 Pre-release
Pre-release

What's Changed

  • feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API by @christso in #1119
  • feat(cli): add *.eval.ts auto-discovery by @christso in #1120

Full Changelog: v4.18.0-next.1...v4.19.0-next.1

v4.18.0-next.1

16 Apr 03:32

Choose a tag to compare

v4.18.0-next.1 Pre-release
Pre-release

What's Changed

  • refactor(core): rename Evaluator to Grader across codebase by @christso in #1111
  • feat(cli): incremental eval runs — resume, append, and aggregate by @christso in #1110
  • feat(core): rename total_budget_usd to budget_usd by @christso in #1117

Full Changelog: v4.17.1...v4.18.0-next.1

v4.17.1

15 Apr 11:19

Choose a tag to compare

What's Changed

  • feat(pipeline): add --target and --targets flags to pipeline run and pipeline input by @christso in #1108

Full Changelog: v4.17.0...v4.17.1

v4.17.1-next.1

15 Apr 11:14

Choose a tag to compare

v4.17.1-next.1 Pre-release
Pre-release

What's Changed

  • feat(pipeline): add --target and --targets flags to pipeline run and pipeline input by @christso in #1108

Full Changelog: v4.17.0...v4.17.1-next.1

v4.17.0

15 Apr 05:51

Choose a tag to compare

What's Changed

  • feat(compare): add normalized gain metric by @christso in #1101
  • docs(agents): add self-describing rules for headers and test contracts by @christso in #1103
  • feat(studio): comparison analytics charts for skills/workflow benchmarking by @christso in #1104
  • docs: rename evaluators to graders by @christso in #1106
  • feat(cli): add results report subcommand by @christso in #1105

Full Changelog: v4.16.0...v4.17.0

v4.17.0-next.1

15 Apr 05:49

Choose a tag to compare

v4.17.0-next.1 Pre-release
Pre-release

What's Changed

  • feat(compare): add normalized gain metric by @christso in #1101
  • docs(agents): add self-describing rules for headers and test contracts by @christso in #1103
  • feat(studio): comparison analytics charts for skills/workflow benchmarking by @christso in #1104
  • docs: rename evaluators to graders by @christso in #1106
  • feat(cli): add results report subcommand by @christso in #1105

Full Changelog: v4.16.0...v4.17.0-next.1

v4.16.0

14 Apr 22:32

Choose a tag to compare

What's Changed

  • feat(providers): add executable field to claude-cli provider by @christso in #1092
  • feat(providers): add cc-mirror provider alias by @christso in #1093
  • feat(targets): remove workspace_template, add target-level hooks by @christso in #1095
  • fix(paths): separate config dir from AGENTV_HOME data dir by @christso in #1096
  • docs(showcase): add bug-fix-benchmark example for SWE-bench style evaluation by @christso in #1091
  • docs: prefer rubrics over llm-grader with inline prompt by @christso in #1097
  • fix: auto-weight grouped rubrics shorthand by criteria count by @christso in #1099

Full Changelog: v4.15.9...v4.16.0