Skip to content

Unify agent model between agent and tests#66

Merged
u9g merged 1 commit intomainfrom
unify-agent-model
Apr 14, 2026
Merged

Unify agent model between agent and tests#66
u9g merged 1 commit intomainfrom
unify-agent-model

Conversation

@u9g
Copy link
Copy Markdown
Contributor

@u9g u9g commented Apr 14, 2026

Summary

  • Extract AGENT_MODEL constant in agent.py so tests use the same LLM model as production
  • Separate agent and judge LLMs in tests so the agent session matches production while evals continue using a cheaper model

Test plan

  • uv run pytest — 3/3 passed

@u9g u9g force-pushed the unify-agent-model branch from 788b9ed to a687c8f Compare April 14, 2026 21:36
Extract AGENT_MODEL constant in agent.py so tests use the same model as production.
@u9g u9g force-pushed the unify-agent-model branch from a687c8f to 22e73b1 Compare April 14, 2026 21:39
@u9g u9g merged commit 1bab7d4 into main Apr 14, 2026
6 checks passed
@u9g u9g deleted the unify-agent-model branch April 14, 2026 21:46
@bcherry
Copy link
Copy Markdown
Contributor

bcherry commented Apr 29, 2026

@theomonnom @u9g @Topherhindman I'm not sure this is really a good idea. the chat model isn't really appropriate for judging so it's not great to push people into thinking they have to use the same model. the real issue is that the agent in the tests was using a different LLM than in production, which should be fixed by putting the LLM property on the Agent itself. that would be better encapsulation and the right pattern to demonstrate

(I saw the node one took the approach of different judge, which is good although i disagree with using a cheaper model for judging. it seems like it would be most valuable to use a bigger model to test the fast chat model you use in-conversation. but it too would benefit from showing the more durable pattern of putting the agent's LLM onto the agent itself.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants