Skip to content

TLSUpgradeProto.data_received hides PostgreSQL ErrorResponse behind generic 'rejected SSL upgrade' ConnectionError #1317

@esperie

Description

@esperie

Summary

asyncpg.connect_utils.TLSUpgradeProto.data_received matches the server's response to SSLRequest with an exact single-byte equality data == b'S', falling through to a generic ConnectionError("... rejected SSL upgrade") on any other value. This silently hides real ErrorResponse messages that the PostgreSQL server sends when it cannot accept the connection for a pre-auth reason (e.g., could not fork new process for connection: Cannot allocate memory, too many connections, or an early-stage auth failure).

psql (libpq) against the exact same server/DSN reports the real error message.

Reproduction

Trigger by exhausting the PostgreSQL server's ability to fork a new backend while asyncpg is opening a pool:

  1. Point asyncpg at a Postgres instance that is at or near its max_connections / memory-for-backends limit.
  2. Call asyncpg.create_pool(..., min_size=N) with N large enough to push the server past its limit.
  3. The first few connections succeed with data == b'S'. A subsequent connection receives the raw bytes the server writes before closing the connection.

On Azure PostgreSQL Flexible Server we captured:

len=68
hex=45636f756c64206e6f7420666f726b206e65772070726f6365737320666f7220636f6e6e656374696f6e3a2043616e6e6f7420616c6c6f63617465206d656d6f72790a00
repr=b'Ecould not fork new process for connection: Cannot allocate memory\n\x00'

The leading byte is b'E', which is the PostgreSQL wire-protocol message type for ErrorResponse, and the payload is the human-readable message (no length header — this is the pre-auth simplified form the backend emits when it cannot even reach the startup state machine).

Current behaviour

# connect_utils.py — TLSUpgradeProto.data_received
def data_received(self, data):
    if data == b'S':
        self.on_data.set_result(True)
    elif (self.ssl_is_advisory and
            self.ssl_context.verify_mode == ssl_module.CERT_NONE and
            data == b'N'):
        self.on_data.set_result(False)
    else:
        self.on_data.set_exception(
            ConnectionError(
                'PostgreSQL server at "{host}:{port}" '
                'rejected SSL upgrade'.format(
                    host=self.host, port=self.port)))

Any payload that is not exactly b'S' or (under narrow conditions) exactly b'N' is mislabeled as "rejected SSL upgrade", regardless of what the server actually said. The caller has no way to see the real reason — the raw bytes are discarded.

Expected behaviour

When the server's first byte is b'E', asyncpg should decode the payload as an ErrorResponse and raise a PostgresError (or InterfaceError) carrying the server's message, so the caller can see could not fork new process for connection: Cannot allocate memory instead of a generic, misleading SSL-themed error.

libpq (psql) reports the server message directly in this scenario, so this is also a libpq-parity gap.

Impact

We spent ~2 days chasing an "SSL upgrade rejection" error that was never about SSL. The real problem was a connection-pool sizing issue exhausting the Postgres server's memory, and the log had "SSL" and "rejected" in every trace so every hypothesis (sslmode, SSLContext, direct_tls, Azure firewall, certificate validation) was a dead end. Once we monkey-patched TLSUpgradeProto.data_received to log the raw bytes, the could not fork new process for connection: Cannot allocate memory message was immediate and obvious.

This same failure mode will apply to any pre-auth server-side error:

  • OOM (our case)
  • too many connections
  • no pg_hba.conf entry for host
  • FATAL: remaining connection slots are reserved
  • FATAL: password authentication failed (when sent before SSL negotiation by some forks/proxies)

All of them become "rejected SSL upgrade" in asyncpg today.

Suggested fix

Expand TLSUpgradeProto.data_received to branch on the first byte:

def data_received(self, data):
    if not data:
        return
    first = data[:1]
    if first == b'S':
        self.on_data.set_result(True)
        return
    if (self.ssl_is_advisory
            and self.ssl_context.verify_mode == ssl_module.CERT_NONE
            and first == b'N'):
        self.on_data.set_result(False)
        return
    if first == b'E':
        # Server sent an ErrorResponse. Try to decode the human message.
        # Handles both the pre-auth raw-ascii form ('E' + text) and the
        # wire-protocol form ('E' + int32 length + NUL-terminated fields
        # where 'M' is the human message).
        message = _decode_error_response(data)
        exc = exceptions.PostgresError(
            message or f'server error during SSL negotiation'
        )
        self.on_data.set_exception(exc)
        return
    self.on_data.set_exception(
        ConnectionError(
            f'PostgreSQL server at "{self.host}:{self.port}" '
            f'sent unexpected byte {first!r} in response to SSLRequest'))

The existing generic ConnectionError can stay as the fallback for bytes that are neither S, N, nor E, with a more informative message that shows the actual byte received.

I can put together a PR with the above and tests if that would be welcome.

Version info

  • asyncpg==0.30.0
  • Python 3.12
  • PostgreSQL: Azure Database for PostgreSQL Flexible Server (but reproducible on any server that exhausts its fork budget)
  • OS: Linux (Azure Container Apps)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions