Skip to content

Guard transport.send() in proc() so IAM/gRPC errors don't crash supervised components#68250

Open
goingforstudying-ctrl wants to merge 3 commits into
apache:mainfrom
goingforstudying-ctrl:fix/stackdriver-guard-transport-send
Open

Guard transport.send() in proc() so IAM/gRPC errors don't crash supervised components#68250
goingforstudying-ctrl wants to merge 3 commits into
apache:mainfrom
goingforstudying-ctrl:fix/stackdriver-guard-transport-send

Conversation

@goingforstudying-ctrl

Copy link
Copy Markdown

Fixes Bug 3 from #68240.

What

StackdriverRemoteLogIO.processorsproc() calls transport.send() without
any error handling. In Airflow 3's supervisor model, REMOTE_TASK_LOG applies
to ALL supervised components (scheduler, dag-processor, triggerer, workers).
A single IAM misconfiguration or gRPC error would crash the entire process.

Observed: dag-processor pod enters CrashLoopBackOff on every log emit when the
Kubernetes Service Account lacks the logging.logEntries.create IAM binding.

Fix

Wrap _transport.send() in try/except Exception and emit a
logging.warning instead of propagating. Log delivery is best-effort;
a Cloud Logging error should never kill a task-executing process.

Changes

  • Guarded _transport.send() call in proc() with try/except
  • New test test_processors_survives_transport_send_failure verifies:
    • proc() returns the event (doesn't crash)
    • warning is logged with the failure details

relates to #68240

@henry3260 henry3260 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify how the observed IAM, gRPC error propagates from _transport.send()? StackdriverRemoteLogIO uses BackgroundThreadTransport by default, whosesend()only enqueues the entry. The network request happens in the background worker, where exceptions frombatch.commit()are already caught.

Comment thread providers/google/tests/unit/google/cloud/log/test_stackdriver_task_handler.py Outdated
Comment thread providers/google/tests/unit/google/cloud/log/test_stackdriver_task_handler.py Outdated
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stackdriver-guard-transport-send branch 18 times, most recently from c55644c to 1e8ef8b Compare June 11, 2026 17:15
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stackdriver-guard-transport-send branch 17 times, most recently from d7f36ad to b4f0658 Compare June 17, 2026 05:26
@shahar1 shahar1 changed the title fix(google): guard transport.send() in proc() so IAM/gRPC errors don't crash supervised components Guard transport.send() in proc() so IAM/gRPC errors don't crash supervised components Jun 17, 2026
@shahar1

shahar1 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Could you clarify how the observed IAM, gRPC error propagates from _transport.send()? StackdriverRemoteLogIO uses BackgroundThreadTransport by default, whosesend()only enqueues the entry. The network request happens in the background worker, where exceptions frombatch.commit()are already caught.

@goingforstudying-ctrl please address Henry's concern, thanks!

@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stackdriver-guard-transport-send branch 7 times, most recently from 6778691 to 45a9a36 Compare June 17, 2026 16:36
…t crash supervised components

In AF3's supervisor model REMOTE_TASK_LOG applies to ALL supervised
components (scheduler, dag-processor, triggerer, workers).  An unguarded
transport.send() failure — e.g. missing logging.logEntries.create IAM
binding — would crash the entire process.  The fix wraps send() in
try/except and logs a warning instead of propagating the exception.

relates to apache#68240
Address review feedback:
- Use mock.create_autospec(Transport) per henry3260's suggestion
- Replace caplog with mock.patch on _logger to avoid logging config
  modification
… instances

Apply reviewer's suggestion consistently across all three test methods
that instantiate mock_transport_type, not just the one originally
flagged.
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stackdriver-guard-transport-send branch from 45a9a36 to 8c7b68c Compare June 17, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants