Summary
asyncpg.connect_utils.TLSUpgradeProto.data_received matches the server's response to SSLRequest with an exact single-byte equality data == b'S', falling through to a generic ConnectionError("... rejected SSL upgrade") on any other value. This silently hides real ErrorResponse messages that the PostgreSQL server sends when it cannot accept the connection for a pre-auth reason (e.g., could not fork new process for connection: Cannot allocate memory, too many connections, or an early-stage auth failure).
psql (libpq) against the exact same server/DSN reports the real error message.
Reproduction
Trigger by exhausting the PostgreSQL server's ability to fork a new backend while asyncpg is opening a pool:
- Point asyncpg at a Postgres instance that is at or near its
max_connections / memory-for-backends limit.
- Call
asyncpg.create_pool(..., min_size=N) with N large enough to push the server past its limit.
- The first few connections succeed with
data == b'S'. A subsequent connection receives the raw bytes the server writes before closing the connection.
On Azure PostgreSQL Flexible Server we captured:
len=68
hex=45636f756c64206e6f7420666f726b206e65772070726f6365737320666f7220636f6e6e656374696f6e3a2043616e6e6f7420616c6c6f63617465206d656d6f72790a00
repr=b'Ecould not fork new process for connection: Cannot allocate memory\n\x00'
The leading byte is b'E', which is the PostgreSQL wire-protocol message type for ErrorResponse, and the payload is the human-readable message (no length header — this is the pre-auth simplified form the backend emits when it cannot even reach the startup state machine).
Current behaviour
# connect_utils.py — TLSUpgradeProto.data_received
def data_received(self, data):
if data == b'S':
self.on_data.set_result(True)
elif (self.ssl_is_advisory and
self.ssl_context.verify_mode == ssl_module.CERT_NONE and
data == b'N'):
self.on_data.set_result(False)
else:
self.on_data.set_exception(
ConnectionError(
'PostgreSQL server at "{host}:{port}" '
'rejected SSL upgrade'.format(
host=self.host, port=self.port)))
Any payload that is not exactly b'S' or (under narrow conditions) exactly b'N' is mislabeled as "rejected SSL upgrade", regardless of what the server actually said. The caller has no way to see the real reason — the raw bytes are discarded.
Expected behaviour
When the server's first byte is b'E', asyncpg should decode the payload as an ErrorResponse and raise a PostgresError (or InterfaceError) carrying the server's message, so the caller can see could not fork new process for connection: Cannot allocate memory instead of a generic, misleading SSL-themed error.
libpq (psql) reports the server message directly in this scenario, so this is also a libpq-parity gap.
Impact
We spent ~2 days chasing an "SSL upgrade rejection" error that was never about SSL. The real problem was a connection-pool sizing issue exhausting the Postgres server's memory, and the log had "SSL" and "rejected" in every trace so every hypothesis (sslmode, SSLContext, direct_tls, Azure firewall, certificate validation) was a dead end. Once we monkey-patched TLSUpgradeProto.data_received to log the raw bytes, the could not fork new process for connection: Cannot allocate memory message was immediate and obvious.
This same failure mode will apply to any pre-auth server-side error:
- OOM (our case)
too many connections
no pg_hba.conf entry for host
FATAL: remaining connection slots are reserved
FATAL: password authentication failed (when sent before SSL negotiation by some forks/proxies)
All of them become "rejected SSL upgrade" in asyncpg today.
Suggested fix
Expand TLSUpgradeProto.data_received to branch on the first byte:
def data_received(self, data):
if not data:
return
first = data[:1]
if first == b'S':
self.on_data.set_result(True)
return
if (self.ssl_is_advisory
and self.ssl_context.verify_mode == ssl_module.CERT_NONE
and first == b'N'):
self.on_data.set_result(False)
return
if first == b'E':
# Server sent an ErrorResponse. Try to decode the human message.
# Handles both the pre-auth raw-ascii form ('E' + text) and the
# wire-protocol form ('E' + int32 length + NUL-terminated fields
# where 'M' is the human message).
message = _decode_error_response(data)
exc = exceptions.PostgresError(
message or f'server error during SSL negotiation'
)
self.on_data.set_exception(exc)
return
self.on_data.set_exception(
ConnectionError(
f'PostgreSQL server at "{self.host}:{self.port}" '
f'sent unexpected byte {first!r} in response to SSLRequest'))
The existing generic ConnectionError can stay as the fallback for bytes that are neither S, N, nor E, with a more informative message that shows the actual byte received.
I can put together a PR with the above and tests if that would be welcome.
Version info
asyncpg==0.30.0
- Python 3.12
- PostgreSQL: Azure Database for PostgreSQL Flexible Server (but reproducible on any server that exhausts its fork budget)
- OS: Linux (Azure Container Apps)
Related
Summary
asyncpg.connect_utils.TLSUpgradeProto.data_receivedmatches the server's response toSSLRequestwith an exact single-byte equalitydata == b'S', falling through to a genericConnectionError("... rejected SSL upgrade")on any other value. This silently hides realErrorResponsemessages that the PostgreSQL server sends when it cannot accept the connection for a pre-auth reason (e.g.,could not fork new process for connection: Cannot allocate memory,too many connections, or an early-stage auth failure).psql(libpq) against the exact same server/DSN reports the real error message.Reproduction
Trigger by exhausting the PostgreSQL server's ability to fork a new backend while asyncpg is opening a pool:
max_connections/ memory-for-backends limit.asyncpg.create_pool(..., min_size=N)withNlarge enough to push the server past its limit.data == b'S'. A subsequent connection receives the raw bytes the server writes before closing the connection.On Azure PostgreSQL Flexible Server we captured:
The leading byte is
b'E', which is the PostgreSQL wire-protocol message type forErrorResponse, and the payload is the human-readable message (no length header — this is the pre-auth simplified form the backend emits when it cannot even reach the startup state machine).Current behaviour
Any payload that is not exactly
b'S'or (under narrow conditions) exactlyb'N'is mislabeled as "rejected SSL upgrade", regardless of what the server actually said. The caller has no way to see the real reason — the raw bytes are discarded.Expected behaviour
When the server's first byte is
b'E', asyncpg should decode the payload as anErrorResponseand raise aPostgresError(orInterfaceError) carrying the server's message, so the caller can seecould not fork new process for connection: Cannot allocate memoryinstead of a generic, misleading SSL-themed error.libpq(psql) reports the server message directly in this scenario, so this is also a libpq-parity gap.Impact
We spent ~2 days chasing an "SSL upgrade rejection" error that was never about SSL. The real problem was a connection-pool sizing issue exhausting the Postgres server's memory, and the log had "SSL" and "rejected" in every trace so every hypothesis (sslmode, SSLContext, direct_tls, Azure firewall, certificate validation) was a dead end. Once we monkey-patched
TLSUpgradeProto.data_receivedto log the raw bytes, thecould not fork new process for connection: Cannot allocate memorymessage was immediate and obvious.This same failure mode will apply to any pre-auth server-side error:
too many connectionsno pg_hba.conf entry for hostFATAL: remaining connection slots are reservedFATAL: password authentication failed(when sent before SSL negotiation by some forks/proxies)All of them become "rejected SSL upgrade" in asyncpg today.
Suggested fix
Expand
TLSUpgradeProto.data_receivedto branch on the first byte:The existing generic
ConnectionErrorcan stay as the fallback for bytes that are neitherS,N, norE, with a more informative message that shows the actual byte received.I can put together a PR with the above and tests if that would be welcome.
Version info
asyncpg==0.30.0Related
preferbehaves differently than libpq) — different symptom, same class of "asyncpg handles SSL negotiation differently from libpq" divergence.