fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support#5215
Conversation
When using conversation_scenario for user simulation, expected_invocation is None because conversations are dynamically generated. The public model EvalMetricResultPerInvocation already types this as Optional[Invocation], but the private _EvalMetricResultWithInvocation requires non-None, causing a pydantic ValidationError during post-processing. - Make expected_invocation Optional[Invocation] = None - Guard attribute accesses in _print_details to handle None - Fall back to actual_invocation.user_content for the prompt column Fixes google#5214
|
Response from ADK Triaging Agent Hello @ASRagab, thank you for submitting this pull request! To help the reviewers, could you please add a Including the logs or a screenshot showing that the You can find more details in our contribution guidelines. Thanks! |
_EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support
Testing Evidence for PR #5215Reproduction ScriptA targeted script that exercises the exact codepath fixed by this PR:
Before (PyPI
|
| Check | Result |
|---|---|
expected_invocation: Optional[Invocation] = None (line 93) |
None accepted without ValidationError |
_print_details prompt fallback to actual_invocation.user_content |
Works correctly |
_print_details expected_response fallback to None |
_convert_content_to_text(None) returns "" |
_print_details expected_tool_calls fallback to None |
_convert_tool_calls_to_text(None) returns "" |
Non-None expected_invocation (regression) |
Still works as before |
Context
This was tested using a conversation_scenario-based evalset from an agent project. The multi-turn evalset has 5 cases that all use conversation_scenario (no explicit conversation array), which is exactly the codepath where local_eval_service.py sets expected_invocation=None during post-processing.
Summary
_EvalMetricResultWithInvocation.expected_invocationis typed asInvocation(required), butlocal_eval_service.py:285-287intentionally sets it toNonewheneval_case.conversationisNone(i.e.,conversation_scenariouser-simulation cases)EvalMetricResultPerInvocationineval_metrics.py:323already types this field asOptional[Invocation] = NoneValidationErrorduring post-processing in_get_eval_metric_results_with_invocation, after all metrics have been computedChanges
expected_invocationOptional[Invocation] = Nonein_EvalMetricResultWithInvocation_print_detailsto handleNone(fall back toactual_invocation.user_contentfor the prompt column,Nonefor expected response/tool calls)_convert_content_to_textand_convert_tool_calls_to_textalready acceptOptionalparametersTesting Plan
Verified with a pytest-based evaluation using
AgentEvaluator.evaluate()against an evalset containingconversation_scenariocases (LLM-backed user simulation, no explicitconversationarrays).Before fix — crashes after ~33 minutes of metric computation during post-processing:
After fix — the
ValidationErroris eliminated. TheNoneexpected_invocation flows through correctly because:Optional[Invocation], matching the upstreamEvalMetricResultPerInvocationmodel_print_detailsgracefully handlesNoneby falling back toactual_invocation.user_contentfor the prompt column and passingNoneto_convert_content_to_text/_convert_tool_calls_to_text(both already acceptOptionalinputs)Reproduction evalset (any evalset with
conversation_scenariotriggers this):{ "eval_set_id": "test", "eval_cases": [{ "eval_id": "scenario_1", "conversation_scenario": { "starting_prompt": "Hello", "conversation_plan": "Ask the agent a question and accept the answer." }, "session_input": {"app_name": "my_agent", "user_id": "user1", "state": {}} }] }Fixes #5214