Skip to content

feat(ai): LangChain agent test harness — fake() AIMessage support, fake_chat_model, Agent.record()#138

Closed
bedus-creation wants to merge 1 commit into
mainfrom
task/langchain-agent-test-harness
Closed

feat(ai): LangChain agent test harness — fake() AIMessage support, fake_chat_model, Agent.record()#138
bedus-creation wants to merge 1 commit into
mainfrom
task/langchain-agent-test-harness

Conversation

@bedus-creation

Copy link
Copy Markdown
Contributor

What

Adds a lightweight test harness on top of LangChain so tests can run LLM agents without hitting a real model API. Builds on the existing `ai/` module (whose `Agent.fake()` glob-stubbing + `AgentSnapshot` replay already existed) and fills in the LangChain pieces sketched in the reference prototype (`example/agents/tinker.py`).

This is the lower-risk, additive approach: the existing raw-SDK `_run` path is untouched, and all prior tests stay green.

Changes

  • `fake()` accepts LangChain `AIMessage` values (alongside `AgentResponse` / `AgentSnapshot`). On a pattern match the `AIMessage` is converted to an `AgentResponse`, preserving `content` and `tool_calls` — the provider is never called.
  • `fake_chat_model(turns)` / `Agent.fake_model(turns)` — build a fake chat model that replays a scripted sequence of assistant turns (`AIMessage` or `str`). Pass it straight to `langchain.agents.create_agent(model=..., tools=[...])` to drive a full tool-calling loop offline (`human → ai(tool call) → tool → ai(final)`). Includes the `bind_tools()` shim that `GenericFakeChatModel` lacks (otherwise `create_agent` raises `NotImplementedError` when tools are present).
  • `Agent.record(path, , pattern="")` — explicit VCR-style record-then-replay entry point. First run calls the real provider and saves the response; later runs replay from disk.
  • New optional `[langchain]` extra (`langchain>=1.0`, `langchain-core>=1.0`). All langchain imports are lazy, so the base package imports without the extra installed.

Design notes

  • LangChain's `ToolCall` is a `TypedDict` (unhashable), so it cannot be a literal dict key — the prototype's `{ToolCall(...): AIMessage(...)}` would raise `TypeError` at construction. Tool-call sequencing is therefore expressed the idiomatic LangChain way: an ordered list of turns fed to `fake_chat_model`, where tool-calling turns are `AIMessage(content="", tool_calls=[...])`.

Tests

  • `tests/ai/test_agent_record.py` — `record()` VCR semantics (replay when cassette exists, record+save when missing, custom pattern, chaining, roundtrip).
  • `tests/ai/test_agent_langchain.py` — `AIMessage` fakes (content + tool_calls, no `_run`), and a real `create_agent` tool loop running entirely offline. Guarded with `importorskip` so envs without the extra skip cleanly.
  • Full suite: 1559 passed, 7 skipped; ruff clean.

…ort, fake_chat_model, Agent.record()

Add a lightweight harness for running LLM agents in tests without hitting
a real model API:

- fake() now accepts LangChain AIMessage values (content + tool_calls) in
  addition to AgentResponse and AgentSnapshot
- fake_chat_model()/Agent.fake_model() build a fake chat model that replays
  a scripted sequence of turns, so create_agent() runs a full tool-calling
  loop offline (adds the bind_tools() shim GenericFakeChatModel lacks)
- Agent.record() provides an explicit VCR-style record-then-replay entry point
- new optional [langchain] extra; langchain imports stay lazy so the base
  package imports without it

Tests: 11 new (record VCR + AIMessage fakes + offline create_agent loop);
full suite green (1559 passed, 7 skipped).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant