Guard transport.send() in proc() so IAM/gRPC errors don't crash supervised components#68250
Open
goingforstudying-ctrl wants to merge 3 commits into
Open
Conversation
0555885 to
72bd974
Compare
henry3260
reviewed
Jun 9, 2026
henry3260
left a comment
Contributor
There was a problem hiding this comment.
Could you clarify how the observed IAM, gRPC error propagates from _transport.send()? StackdriverRemoteLogIO uses BackgroundThreadTransport by default, whosesend()only enqueues the entry. The network request happens in the background worker, where exceptions frombatch.commit()are already caught.
c55644c to
1e8ef8b
Compare
d7f36ad to
b4f0658
Compare
Contributor
@goingforstudying-ctrl please address Henry's concern, thanks! |
6778691 to
45a9a36
Compare
…t crash supervised components In AF3's supervisor model REMOTE_TASK_LOG applies to ALL supervised components (scheduler, dag-processor, triggerer, workers). An unguarded transport.send() failure — e.g. missing logging.logEntries.create IAM binding — would crash the entire process. The fix wraps send() in try/except and logs a warning instead of propagating the exception. relates to apache#68240
Address review feedback: - Use mock.create_autospec(Transport) per henry3260's suggestion - Replace caplog with mock.patch on _logger to avoid logging config modification
… instances Apply reviewer's suggestion consistently across all three test methods that instantiate mock_transport_type, not just the one originally flagged.
45a9a36 to
8c7b68c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes Bug 3 from #68240.
What
StackdriverRemoteLogIO.processors→proc()callstransport.send()withoutany error handling. In Airflow 3's supervisor model,
REMOTE_TASK_LOGappliesto ALL supervised components (scheduler, dag-processor, triggerer, workers).
A single IAM misconfiguration or gRPC error would crash the entire process.
Observed: dag-processor pod enters
CrashLoopBackOffon every log emit when theKubernetes Service Account lacks the
logging.logEntries.createIAM binding.Fix
Wrap
_transport.send()intry/except Exceptionand emit alogging.warninginstead of propagating. Log delivery is best-effort;a Cloud Logging error should never kill a task-executing process.
Changes
_transport.send()call inproc()with try/excepttest_processors_survives_transport_send_failureverifies:relates to #68240