fix: add TCP keepalive to default httpx transport to prevent NAT hangs#3368
fix: add TCP keepalive to default httpx transport to prevent NAT hangs#3368Oxygen56 wants to merge 1 commit into
Conversation
httpx's default transport does not enable TCP keepalive, causing non-streaming requests to hang silently behind NAT when idle connections are dropped by the NAT gateway. This adds socket-level TCP keepalive options (SO_KEEPALIVE, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) to the default HTTP client transport. Fixes openai#3269 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca02bf25f7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| kwargs.setdefault("timeout", DEFAULT_TIMEOUT) | ||
| kwargs.setdefault("limits", DEFAULT_CONNECTION_LIMITS) | ||
| kwargs.setdefault("follow_redirects", True) | ||
| kwargs.setdefault("transport", httpx.HTTPTransport(socket_options=_build_keepalive_socket_options())) |
There was a problem hiding this comment.
Raise the httpx floor before using socket_options
This package still declares httpx>=0.23.0, <1 in pyproject.toml, but socket_options was only added to HTTPTransport/AsyncHTTPTransport in httpx 0.25.0. In environments that satisfy the current dependency with httpx 0.23.x or 0.24.x, constructing the default OpenAI client will raise TypeError: ... unexpected keyword argument 'socket_options' before any request is made. Please either guard this argument for older httpx versions or bump the minimum dependency.
Useful? React with 👍 / 👎.
| kwargs.setdefault("timeout", DEFAULT_TIMEOUT) | ||
| kwargs.setdefault("limits", DEFAULT_CONNECTION_LIMITS) | ||
| kwargs.setdefault("follow_redirects", True) | ||
| kwargs.setdefault("transport", httpx.HTTPTransport(socket_options=_build_keepalive_socket_options())) |
There was a problem hiding this comment.
Preserve httpx client options when adding keepalive
Creating and passing a concrete transport here causes httpx.Client to use that transport as-is, so the limits, verify, cert, trust_env, http1, and http2 kwargs set on DefaultHttpxClient(...) are no longer applied to the default non-proxy transport. This regresses both the SDK's DEFAULT_CONNECTION_LIMITS and documented customization such as DefaultHttpxClient(http2=True) unless callers manually build their own transport. Please build the transport with the effective kwargs or use an approach that does not bypass httpx's transport initialization.
Useful? React with 👍 / 👎.
| kwargs.setdefault("timeout", DEFAULT_TIMEOUT) | ||
| kwargs.setdefault("limits", DEFAULT_CONNECTION_LIMITS) | ||
| kwargs.setdefault("follow_redirects", True) | ||
| kwargs.setdefault("transport", httpx.AsyncHTTPTransport(socket_options=_build_keepalive_socket_options())) |
There was a problem hiding this comment.
Preserve async httpx client options when adding keepalive
The async default client has the same issue: supplying a prebuilt AsyncHTTPTransport means httpx.AsyncClient does not apply its limits, TLS, env, or HTTP/2 options to the default transport. Users relying on DefaultAsyncHttpxClient(http2=True) or the SDK's default connection limits silently get httpx transport defaults instead. Please pass the effective options into the transport or avoid replacing httpx's normal transport construction.
Useful? React with 👍 / 👎.
Fixes #3269 — Non-streaming calls silently hang forever behind NAT.
httpx's default transport does not enable TCP keepalive, causing non-streaming requests to hang silently behind NAT when idle connections are dropped by the NAT gateway.
This adds socket-level TCP keepalive options to the default HTTP client transport:
SO_KEEPALIVE— enable keepalive probesTCP_KEEPIDLE(Linux) /TCP_KEEPALIVE(macOS) — 60s idle before first probeTCP_KEEPINTVL— 60s between subsequent probesTCP_KEEPCNT— 5 unacknowledged probes before declaring deadUses
kwargs.setdefaultso any caller-supplied custom transport is completely unaffected.This is the same approach used by the Anthropic Python SDK.