VeRO: Versioning Rewards and Observations

VeRO is an evaluation harness for using coding agents to optimize LLM-based agents and workflows. It treats agent code as a versioned artifact — making changes, evaluating results, and hill-climbing toward better performance using git version control.

Paper: VeRO: An Evaluation Harness for Agents to Optimize Agents

Repository Structure

vero/
├── vero/               # Core library (scale-vero)
├── vero-agents/        # Agent implementations (benchmarking targets)
├── vero-benchmarking/  # Benchmarking scripts and analysis
└── LICENSE

vero/

The core optimization framework. Provides:

Policy — orchestrates the optimization loop (agent + evaluator + git)
Agents — VeroAgent (OpenAI Agents SDK) and ClaudeCodeAgent (Claude Agent SDK)
Evaluator — runs task evaluations in isolated subprocess environments
Tools — MCP-based tools for agents (bash, file I/O, experiment runner, dataset viewer, etc.)
Traces — session analysis and LLM-based trace interpretation

cd vero && uv sync --extra optimize

See vero/README.md for full documentation.

vero-agents/

Agent implementations used as optimization targets:

Agent	Description
generic-agent	General-purpose agent for MATH, GPQA, GAIA, GSM8K, etc.
web_search_agent	Web search agent for SimpleQA, Facts Search
KIRA	Terminal task agent for Terminal Bench 2.0
tau-bench	Customer service tool-use agent
pharma_summarizer	Document summarization agent

See vero-agents/README.md for details.

vero-benchmarking/

Scripts and infrastructure for running optimization experiments:

cd vero-benchmarking && uv sync --all-extras

# Run an optimization experiment
uv run python scripts/run_benchmark.py --scaffold claude-code-vmf --model sonnet --task math

# Build datasets
./scripts/build_datasets.sh

See vero-benchmarking/README.md for full documentation.

Quick Start

Prerequisites

Python 3.11+
uv
Git
Access to an LLM provider (via LiteLLM, OpenAI, Anthropic, etc.)

Install

git clone <repo-url> && cd vero

# Install core library
cd vero && uv sync --extra optimize

# Install benchmarking tools
cd ../vero-benchmarking && uv sync --all-extras

Run Your First Optimization

from agents import Agent as OAIAgent
from vero.policy import Policy
from vero.agents.vero import VeroAgent

policy = Policy(
    project_path="/path/to/my-agent",
    dataset="/path/to/my-dataset",
    agent=VeroAgent(
        oai_agent=OAIAgent(name="VeroAgent", model="anthropic/claude-sonnet-4-5-20250929"),
    ),
    task="main",
    train_budget=10,
    max_turns=200,
)

best = await policy.run()
print(f"Best commit: {best.commit}, score: {best.score}")

Citation

@article{ursekar2026vero,
  title={VeRO: An Evaluation Harness for Agents to Optimize Agents},
  author={Ursekar, Varun and Shanker, Apaar and Chatrath, Veronica and Xue, Yuan (Emily) and Denton, Sam},
  journal={arXiv preprint arXiv:2602.22480},
  year={2026}
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VeRO: Versioning Rewards and Observations

Repository Structure

vero/

vero-agents/

vero-benchmarking/

Quick Start

Prerequisites

Install

Run Your First Optimization

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
vero-agents		vero-agents
vero-benchmarking		vero-benchmarking
vero		vero
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VeRO: Versioning Rewards and Observations

Repository Structure

vero/

vero-agents/

vero-benchmarking/

Quick Start

Prerequisites

Install

Run Your First Optimization

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages