Environment
- asyncpg version: 0.31.0 (also reproduced on 0.30.x)
- PostgreSQL version: 16
- Python version: 3.11.14
- Platform: Linux (Kubernetes)
- pgbouncer: No
- SQLAlchemy: 2.0.23
Summary
When an asyncpg operation is cancelled via asyncio.CancelledError while mid-query, the cancellation mechanism in connect_utils._cancel can raise a built-in ConnectionError that escapes to the caller. This is problematic because:
- Callers (e.g. SQLAlchemy) expect asyncpg-specific exception types and don't handle built-in
ConnectionError
- The cancel operation is inherently best-effort — if the cancel connection fails, the error should be suppressed or wrapped, not propagated
This is related to #1211 but occurs on non-direct_tls connections via the cancel request code path.
Reproduction flow
- An asyncpg connection is executing a query (e.g. inside SQLAlchemy's
session.execute())
- The asyncio task is cancelled (
task.cancel())
CancelledError propagates into protocol.query() / bind_execute
- asyncpg's cancellation handler tries to send a PostgreSQL cancel request by opening a new SSL connection via
connect_utils._cancel → _create_ssl_connection
- The new connection fails (server already closed the original, or network issue)
TLSUpgradeProto.connection_lost() raises built-in ConnectionError('unexpected connection_lost() call')
- This escapes through
connect_utils._cancel (which has no error handling around _create_ssl_connection)
- Caller receives
ConnectionError instead of CancelledError
Traceback
asyncio.exceptions.CancelledError (original exception)
During handling of the above exception, another exception occurred:
File "asyncpg/transaction.py", line 206, in __rollback
await self._connection.execute(query)
File "asyncpg/connection.py", line 350, in execute
result = await self._protocol.query(query, timeout)
File "asyncpg/connection.py", line 1584, in _cancel
await connect_utils._cancel(
File "asyncpg/connect_utils.py", line 1040, in _cancel
tr, pr = await _create_ssl_connection(
File "asyncpg/connect_utils.py", line 752, in _create_ssl_connection
do_ssl_upgrade = await pr.on_data
^^^^^^^^^^^^^^^^
ConnectionError: unexpected connection_lost() call
Root cause
Two issues in connect_utils.py:
1. _cancel() has no error handling around _create_ssl_connection
async def _cancel(*, loop, addr, params, backend_pid, backend_secret):
...
if params.ssl and params.sslmode != SSLMode.allow:
tr, pr = await _create_ssl_connection(...) # ← no try/except!
...
The cancel request is best-effort (we're telling PostgreSQL to cancel a query on a connection that may already be dead). If opening the cancel connection fails, the error should be suppressed or wrapped in asyncpg.InterfaceError, not propagated as a raw ConnectionError.
2. TLSUpgradeProto.connection_lost() raises built-in ConnectionError
def connection_lost(self, exc):
if not self.on_data.done():
if exc is None:
exc = ConnectionError('unexpected connection_lost() call')
self.on_data.set_exception(exc)
This raises a built-in Python ConnectionError, not an asyncpg exception type. Callers like SQLAlchemy check for asyncpg.InterfaceError or asyncpg.PostgresError to detect disconnects. A built-in ConnectionError bypasses all those checks, which means:
- SQLAlchemy's
is_disconnect() doesn't recognize it
- SQLAlchemy's pool pre-ping handler (
_do_ping_w_event) only catches self.loaded_dbapi.Error, so ConnectionError escapes
- The pool's retry logic (which would create a fresh connection) never triggers
Suggested fix
Option A (minimal): Catch OSError (parent of ConnectionError) in connect_utils._cancel() and suppress it — cancel is best-effort:
async def _cancel(*, loop, addr, params, backend_pid, backend_secret):
...
try:
if params.ssl and params.sslmode != SSLMode.allow:
tr, pr = await _create_ssl_connection(...)
...
except OSError:
# Cancel is best-effort. If we can't reach the server, the
# connection is dead anyway.
return
Option B (comprehensive): Also change TLSUpgradeProto.connection_lost() to raise asyncpg.InterfaceError instead of built-in ConnectionError, so callers can handle it consistently:
def connection_lost(self, exc):
if not self.on_data.done():
if exc is None:
exc = InterfaceError('unexpected connection_lost() call')
self.on_data.set_exception(exc)
Impact
This causes process crashes in production services. When a task is cancelled during a DB query, the ConnectionError escapes all exception handlers (which expect either CancelledError or asyncpg-specific exceptions) and terminates the process.
This is 100% correlated with CancelledError in our logs — every ConnectionError: unexpected connection_lost() we've seen is triggered by task cancellation.
Additional context
We use Google CloudSQL with SSL connections. The PostgreSQL server is accessed over SSL (non-direct_tls), which means the cancel code path goes through _create_ssl_connection to establish a new SSL connection for sending the cancel request.
Environment
Summary
When an asyncpg operation is cancelled via
asyncio.CancelledErrorwhile mid-query, the cancellation mechanism inconnect_utils._cancelcan raise a built-inConnectionErrorthat escapes to the caller. This is problematic because:ConnectionErrorThis is related to #1211 but occurs on non-
direct_tlsconnections via the cancel request code path.Reproduction flow
session.execute())task.cancel())CancelledErrorpropagates intoprotocol.query()/bind_executeconnect_utils._cancel→_create_ssl_connectionTLSUpgradeProto.connection_lost()raises built-inConnectionError('unexpected connection_lost() call')connect_utils._cancel(which has no error handling around_create_ssl_connection)ConnectionErrorinstead ofCancelledErrorTraceback
Root cause
Two issues in
connect_utils.py:1.
_cancel()has no error handling around_create_ssl_connectionThe cancel request is best-effort (we're telling PostgreSQL to cancel a query on a connection that may already be dead). If opening the cancel connection fails, the error should be suppressed or wrapped in
asyncpg.InterfaceError, not propagated as a rawConnectionError.2.
TLSUpgradeProto.connection_lost()raises built-inConnectionErrorThis raises a built-in Python
ConnectionError, not an asyncpg exception type. Callers like SQLAlchemy check forasyncpg.InterfaceErrororasyncpg.PostgresErrorto detect disconnects. A built-inConnectionErrorbypasses all those checks, which means:is_disconnect()doesn't recognize it_do_ping_w_event) only catchesself.loaded_dbapi.Error, soConnectionErrorescapesSuggested fix
Option A (minimal): Catch
OSError(parent ofConnectionError) inconnect_utils._cancel()and suppress it — cancel is best-effort:Option B (comprehensive): Also change
TLSUpgradeProto.connection_lost()to raiseasyncpg.InterfaceErrorinstead of built-inConnectionError, so callers can handle it consistently:Impact
This causes process crashes in production services. When a task is cancelled during a DB query, the
ConnectionErrorescapes all exception handlers (which expect eitherCancelledErroror asyncpg-specific exceptions) and terminates the process.This is 100% correlated with
CancelledErrorin our logs — everyConnectionError: unexpected connection_lost()we've seen is triggered by task cancellation.Additional context
We use Google CloudSQL with SSL connections. The PostgreSQL server is accessed over SSL (non-
direct_tls), which means the cancel code path goes through_create_ssl_connectionto establish a new SSL connection for sending the cancel request.