Skip to content

feat(plugins): add on_agent_error_callback and on_run_error_callback lifecycle hooks#4974

Open
STHITAPRAJNAS wants to merge 2 commits intogoogle:mainfrom
STHITAPRAJNAS:feat/lifecycle-error-callbacks-4774
Open

feat(plugins): add on_agent_error_callback and on_run_error_callback lifecycle hooks#4974
STHITAPRAJNAS wants to merge 2 commits intogoogle:mainfrom
STHITAPRAJNAS:feat/lifecycle-error-callbacks-4774

Conversation

@STHITAPRAJNAS
Copy link
Copy Markdown

Summary

Fixes #4774

When an unhandled exception propagates out of an agent's _run_async_impl / _run_live_impl, or out of the runner's execution loop, the existing after_agent_callback / after_run_callback were silently skipped. This made fatal failures invisible to observability plugins (e.g. BigQuery analytics), inflating success rates and losing failure events entirely.

Changes

  • BasePlugin: add on_agent_error_callback(agent, callback_context, error) and on_run_error_callback(invocation_context, error) with safe no-op defaults.
  • PluginManager: add run_on_agent_error_callback / run_on_run_error_callback dispatch methods backed by a new _run_error_callbacks helper that fans out to all plugins (no early-exit) and logs — but does not propagate — individual plugin failures.
  • base_agent.py: wrap run_async / run_live generator loops in try/except; call run_on_agent_error_callback before re-raising.
  • runners.py: wrap the execute_fn generator loop in try/except; call run_on_run_error_callback before re-raising. after_run_callback is intentionally skipped on the error path so plugins can distinguish clean completions from fatal failures.

Design decisions

Decision Rationale
Error callbacks are fire-and-forget (no early-exit, no return value) They are pure observers — a broken plugin must not prevent others from recording the failure
after_run_callback is not called on error Allows plugins to distinguish success from failure; analogous to how on_model_error_callback vs after_model_callback work
Individual plugin failures are logged, not re-raised A broken observability plugin must never hide the original error
The original exception is always re-raised The framework does not suppress errors; this is notification only

Testing plan

  • tests/unittests/plugins/test_lifecycle_error_callbacks.py — 12 tests covering BasePlugin defaults, PluginManager fan-out semantics, argument forwarding, and no-early-exit behaviour
  • tests/unittests/runners/test_runner_error_callbacks.py — 7 integration tests covering runner error/success paths with real InMemoryRunner
  • tests/unittests/agents/test_agent_error_callbacks.py — 11 tests covering run_async and run_live error paths
  • All 688 existing plugin + agent + runner tests pass unchanged

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 24, 2026
@rohityan rohityan self-assigned this Mar 24, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @STHITAPRAJNAS , Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Can you please fix the failing formatting tests. You can use autoformat.sh

@rohityan rohityan requested a review from Jacksunwei March 24, 2026 20:01
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @Jacksunwei , can you please review this.

@STHITAPRAJNAS
Copy link
Copy Markdown
Author

Formatting done

@rohityan rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Mar 27, 2026
@STHITAPRAJNAS STHITAPRAJNAS force-pushed the feat/lifecycle-error-callbacks-4774 branch 2 times, most recently from 358f9ed to cd4ac4e Compare March 31, 2026 05:28
@STHITAPRAJNAS
Copy link
Copy Markdown
Author

Hi @Jacksunwei — gentle ping! This PR has been ready for review since Mar 25 with all checks passing. Would appreciate your feedback when you get a chance. Thanks!

…lifecycle hooks

Fixes google#4774

When an unhandled exception propagates out of an agent's _run_async_impl /
_run_live_impl, or out of the runner's execution loop, the existing
after_agent_callback / after_run_callback were silently skipped.  This made
fatal failures invisible to observability plugins (e.g. BigQuery analytics),
inflating success rates and losing failure events entirely.

Changes:
- BasePlugin: add on_agent_error_callback(agent, callback_context, error)
  and on_run_error_callback(invocation_context, error) with safe no-op defaults.
- PluginManager: add run_on_agent_error_callback / run_on_run_error_callback
  dispatch methods backed by a new _run_error_callbacks helper that fans out
  to ALL plugins (no early-exit) and logs — but does not propagate —
  individual plugin failures.
- base_agent.py: wrap run_async / run_live generator loops in try/except;
  call run_on_agent_error_callback before re-raising.
- runners.py: wrap the execute_fn generator loop in try/except;
  call run_on_run_error_callback before re-raising.
  after_run_callback is intentionally skipped on the error path so plugins
  can distinguish clean completions from fatal failures.

Tests (30 new, all passing):
- tests/unittests/plugins/test_lifecycle_error_callbacks.py
- tests/unittests/runners/test_runner_error_callbacks.py
- tests/unittests/agents/test_agent_error_callbacks.py
@STHITAPRAJNAS STHITAPRAJNAS force-pushed the feat/lifecycle-error-callbacks-4774 branch from cd4ac4e to 5b92cb8 Compare April 24, 2026 00:07
@STHITAPRAJNAS
Copy link
Copy Markdown
Author

Hi @Jacksunwei — following up on this. The on_agent_error_callback and on_run_error_callback hooks fill a real gap for anyone running plugins that need to distinguish clean completions from fatal failures (e.g. alerting, metrics, cleanup). The implementation is tight and the test coverage is solid. Would appreciate your thoughts whenever you have bandwidth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Lifecycle Error Callbacks (on_agent_error, on_run_error) to ADK Framework

3 participants