Skip to content

fix: reconnect to platform events websocket after connection drop#967

Draft
vdusek wants to merge 2 commits into
masterfrom
fix/events-websocket-reconnect
Draft

fix: reconnect to platform events websocket after connection drop#967
vdusek wants to merge 2 commits into
masterfrom
fix/events-websocket-reconnect

Conversation

@vdusek

@vdusek vdusek commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

  • Previously, ApifyEventManager._process_platform_messages connected to the platform events websocket exactly once. Any dropped connection raised ConnectionClosedError and permanently ended the processing task, while a graceful server-side close exited silently. In both cases, the Actor missed all subsequent platform events (MIGRATING, ABORTING, PERSIST_STATE) for the rest of the run.
  • The connection now uses the websockets reconnect iterator, which re-establishes the connection after every drop or graceful close, with backoff on failed attempts. An abnormal drop is logged as a warning and a graceful close as info, both with the close code and reason, and a successful reconnect is logged as well.
  • A process_exception callback keeps errors fatal before the first successful connection, so Actor.init still fails fast on a misconfigured URL instead of hanging. After the first connection, the default websockets classification decides which errors are transient and retried.
  • __aexit__ now cancels the processing task before closing the websocket. Otherwise, every clean shutdown would trigger a spurious reconnect attempt.
  • Added a regression test, parametrized over graceful and abnormal closes, that drops the connection server-side and asserts the drop is logged, the client reconnects, and events keep arriving. It replaces the obsolete mid-stream disconnect test, whose premise (the task ends after a drop) no longer holds.

🤖 Generated with Claude Code

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Jun 11, 2026
@vdusek vdusek self-assigned this Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.98%. Comparing base (2cc5602) to head (fe082f1).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/apify/events/_apify_event_manager.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
+ Coverage   89.91%   89.98%   +0.06%     
==========================================
  Files          49       49              
  Lines        3085     3095      +10     
==========================================
+ Hits         2774     2785      +11     
+ Misses        311      310       -1     
Flag Coverage Δ
e2e 35.92% <7.14%> (-0.06%) ⬇️
integration 56.83% <7.14%> (-0.16%) ⬇️
unit 78.83% <96.42%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

After the first successful connection, delegate to the default `websockets` transient/fatal classification
instead of retrying every error. Log graceful and abnormal closes (with close code and reason) as well as
reconnect success, and cover both close paths with parametrized tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants