Fix _sock_recv infinite loop when StatusDB TCP connection drops#322
Open
TKeji wants to merge 1 commit intopytest-dev:masterfrom
Open
Fix _sock_recv infinite loop when StatusDB TCP connection drops#322TKeji wants to merge 1 commit intopytest-dev:masterfrom
TKeji wants to merge 1 commit intopytest-dev:masterfrom
Conversation
When using --reruns with pytest-xdist, every test makes two blocking TCP
calls to the StatusDB server (get_test_failures and set_test_reruns in
pytest_runtest_protocol). If the server-side connection drops, _sock_recv
enters an infinite loop because recv(1) returns b'' (empty bytes) on a
closed socket, but the code only checks for the newline delimiter:
while True:
b = conn.recv(1)
if b == self.delim: # b'' != b'\n' -> never breaks
break
buf += b
This causes xdist workers to hang indefinitely at ~90% CPU, appearing
stuck on a test that never completes. The hang persists until the process
is manually killed.
The fix adds a check for empty bytes from recv(1) and raises
ConnectionError, which surfaces as an INTERNALERROR that xdist handles
by replacing the worker.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a hang in the StatusDB socket protocol used during pytest-xdist runs with --reruns by making _sock_recv detect closed TCP connections and fail fast instead of looping forever.
Changes:
- Update
SocketDB._sock_recvto raiseConnectionErrorwhenrecv(1)returns empty bytes (closed connection). - Add a regression test covering the closed-connection behavior.
- Document the fix in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/pytest_rerunfailures.py |
Adds an EOF (b"") check in _sock_recv and raises ConnectionError to avoid infinite loops on dropped connections. |
tests/test_pytest_rerunfailures.py |
Adds a unit test using socket.socketpair() to ensure _sock_recv raises on closed connections. |
CHANGES.rst |
Adds a 16.2 (unreleased) changelog entry describing the fix and impact in xdist runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1424
to
+1427
| db = SocketDB() | ||
| with pytest.raises(ConnectionError, match="closed unexpectedly"): | ||
| db._sock_recv(s1) | ||
|
|
Comment on lines
+1422
to
+1428
| s2.close() # Close one end — recv on s1 will return b"" | ||
|
|
||
| db = SocketDB() | ||
| with pytest.raises(ConnectionError, match="closed unexpectedly"): | ||
| db._sock_recv(s1) | ||
|
|
||
| s1.close() |
Comment on lines
+1425
to
+1428
| with pytest.raises(ConnectionError, match="closed unexpectedly"): | ||
| db._sock_recv(s1) | ||
|
|
||
| s1.close() |
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When using
--rerunswithpytest-xdist, every test makes two blocking TCP calls to theStatusDBserver (get_test_failuresandset_test_rerunsinpytest_runtest_protocol). If the server-side connection drops,_sock_recventers an infinite loop:recv(1)returnsb""(empty bytes) on a closed socket, but the code only checks for the newline delimiter. Sinceb"" != b"\n"is alwaysTrue, the loop never exits.This causes xdist workers to hang indefinitely at ~90% CPU, appearing stuck on a test that never completes (
[pytest-xdist running] ...). The hang persists until the process is manually killed.Fix
Add a check for empty bytes from
recv(1)and raiseConnectionError:The
ConnectionErrorpropagates as anINTERNALERRORthat xdist handles by replacing the worker — much better than hanging forever.Reproduction
Minimal reproduction (proves the infinite loop on the unpatched version):
Full reproduction in a test run: monkey-patch
ServerStatusDB.run_connectionto close the server-side connection after a few requests, then runpytest --reruns=1 -n 1 --dist=loadgroup. The worker hangs on the next test'sdb.get_test_failures()call in_sock_recv.Impact
Affects any xdist run with
--rerunsenabled. Without--reruns, the TCP protocol is never exercised (pytest_runtest_protocolreturns early), so the bug doesn't manifest.Checklist
CHANGES.rst