Skip to content

Fix Supervisor crash in Stackdriver remote log IO#68295

Open
23tae wants to merge 1 commit into
apache:mainfrom
23tae:fix-gcl-supervisor-crash
Open

Fix Supervisor crash in Stackdriver remote log IO#68295
23tae wants to merge 1 commit into
apache:mainfrom
23tae:fix-gcl-supervisor-crash

Conversation

@23tae

@23tae 23tae commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR prevents the Airflow 3 Supervisor process from crashing entirely when transient network or IAM errors occur during Stackdriver log transmission.

Description

This PR addresses Bug 3 described in #68240.

In the new Airflow 3 Task SDK architecture, the StackdriverRemoteLogIO handler runs within the Supervisor process rather than the task process itself. If an exception is raised during _transport.send(), it propagates upwards and crashes the entire Supervisor process, disrupting task monitoring.

Key changes

  • Exception Handling: Wrapped the _transport.send() call in a try...except Exception block. When a transmission fails, the exception is safely caught, and a warning is logged using the internal _logger. This ensures the Supervisor process remains highly resilient.

Verification Results

I have verified the changes using prek and breeze.

  • Static Checks (Prek): Passed
  • Unit Tests (Breeze): Passed

related: #68240


Was generative AI tooling used to co-author this PR?
  • Yes

Generated-by: Antigravity following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Note

✅ Ready for review · @23tae@potiuk · 2026-06-12 13:30 UTC

Thanks @23tae — all checks are green and this PR is marked ready for maintainer review. The ball is with the maintainers now; a maintainer will take the next look.

Automated triage — may be imperfect.

@shahar1 shahar1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve conflicts

@23tae 23tae force-pushed the fix-gcl-supervisor-crash branch from baa54c6 to dbde7e6 Compare June 17, 2026 07:48
@23tae

23tae commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@shahar1 Conflicts resolved. Thanks!

@23tae 23tae requested a review from shahar1 June 17, 2026 09:09
@eladkal

eladkal commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Does this PR solve #68240 ?

@23tae

23tae commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:providers provider:google Google (including GCP) related issues ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants