fix(sdk): raise asyncio StreamReader buffer in Python AsyncHostTransport#2760
Conversation
The Python async transport spawned the host CLI without passing a `limit=`
to `asyncio.create_subprocess_exec`, so its stdout `StreamReader` inherited
asyncio's default 64 KiB buffer. Every host response is written as a single
newline-delimited JSON line, so any `cli.invoke` whose serialized result
exceeds 64 KiB (e.g. `superdoc_get_content` on larger documents) caused
`readline()` to raise `ValueError: Separator is not found, and chunk
exceed the limit` inside `_reader_loop`. The exception was caught by the
generic reader-loop handler and pending requests were rejected with the
misleading `HOST_DISCONNECTED` error — even though the host process was
still alive and healthy.
Pass `limit=` to `create_subprocess_exec` and expose it as a new
`stdout_buffer_limit_bytes` constructor option on `AsyncHostTransport`,
threaded through `SuperDocAsyncRuntime` and `AsyncSuperDocClient`. The
default of 64 MiB safely covers the host's own 32 MiB
`DEFAULT_MAX_STDIN_BYTES` input cap with room for ~2x JSON expansion.
`SyncHostTransport` is unaffected — it uses raw blocking `subprocess.Popen`
which has no asyncio buffer limit.
Adds a `TestAsyncLargeResponse` regression suite that:
1. Round-trips a 200 KB response through the default-configured transport.
2. Pins that an explicitly tightened `stdout_buffer_limit_bytes` still
reproduces the original failure mode, guaranteeing the option is
wired through to `create_subprocess_exec`.
caio-pizzol
left a comment
There was a problem hiding this comment.
@michaelreavant — fix is right, flagging a few things before merge.
the main one: when the buffer limit is hit, the error handler at transport.py:619-631 marks the transport as disconnected but doesn't kill the host process. a later dispose() sees the state as already disconnected and returns early, so the host keeps running. your PR raises the ceiling (64 MiB vs 64 KiB) but doesn't close this gap — any caller who sets stdout_buffer_limit_bytes below their real response size will leave host processes running in the background.
fix: in the except Exception branch, kill the process and clear self._process before flipping state. the cleanest way is to schedule a _cleanup task from that branch.
while you're there: the error code on that path is still HOST_DISCONNECTED, which points people at the wrong problem since the host didn't actually die. catching ValueError / asyncio.LimitOverrunError separately and surfacing HOST_PROTOCOL_ERROR with a "raise stdout_buffer_limit_bytes" hint would be much more actionable.
smaller notes inline.
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.DEVNULL, | ||
| env={**os.environ, **self._env}, | ||
| limit=self._stdout_buffer_limit_bytes, |
There was a problem hiding this comment.
this exposes the bug down at transport.py:619-631 — see the summary. the fix belongs in that branch, not here.
| request_timeout_ms: Optional[int] = None, | ||
| watchdog_timeout_ms: int = 30_000, | ||
| max_queue_depth: int = 100, | ||
| stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, |
There was a problem hiding this comment.
64 * 1024 * 1024 also lives in runtime.py:82 and client.py:343. hoist a module-level constant so tuning is one place:
| stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, | |
| stdout_buffer_limit_bytes: int = _DEFAULT_STDOUT_BUFFER_LIMIT_BYTES, |
(and define _DEFAULT_STDOUT_BUFFER_LIMIT_BYTES = 64 * 1024 * 1024 at module scope, then reference it from the three signatures)
| request_timeout_ms: int | None = None, | ||
| watchdog_timeout_ms: int = 30_000, | ||
| max_queue_depth: int = 100, | ||
| stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, |
There was a problem hiding this comment.
short note on when to raise this — today the reason only lives in a comment in transport.py.
| stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, | |
| stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, # raise if responses exceed 64 MiB |
| """Responses larger than the StreamReader buffer must not crash the reader.""" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_response_above_default_64kb_buffer(self): |
There was a problem hiding this comment.
64kb refers to the old default — after this PR it's 64 MiB, so the name reads backwards.
| async def test_response_above_default_64kb_buffer(self): | |
| async def test_response_above_asyncio_default_streamreader_limit(self): |
| try: | ||
| transport = AsyncHostTransport( | ||
| cli, | ||
| startup_timeout_ms=5_000, | ||
| stdout_buffer_limit_bytes=64 * 1024, | ||
| ) | ||
| await transport.connect() | ||
| with pytest.raises(SuperDocError) as exc_info: | ||
| await transport.invoke(_TEST_OP, {'query': 'big'}) | ||
| assert exc_info.value.code == HOST_DISCONNECTED | ||
| finally: | ||
| _cleanup_wrapper(cli) |
There was a problem hiding this comment.
no await transport.dispose() on this path, so the host is left running after the test. wrap in try/finally like the sibling tests:
| try: | |
| transport = AsyncHostTransport( | |
| cli, | |
| startup_timeout_ms=5_000, | |
| stdout_buffer_limit_bytes=64 * 1024, | |
| ) | |
| await transport.connect() | |
| with pytest.raises(SuperDocError) as exc_info: | |
| await transport.invoke(_TEST_OP, {'query': 'big'}) | |
| assert exc_info.value.code == HOST_DISCONNECTED | |
| finally: | |
| _cleanup_wrapper(cli) | |
| try: | |
| transport = AsyncHostTransport( | |
| cli, | |
| startup_timeout_ms=5_000, | |
| stdout_buffer_limit_bytes=64 * 1024, | |
| ) | |
| await transport.connect() | |
| with pytest.raises(SuperDocError) as exc_info: | |
| await transport.invoke(_TEST_OP, {'query': 'big'}) | |
| assert exc_info.value.code == HOST_DISCONNECTED | |
| await transport.dispose() | |
| finally: | |
| _cleanup_wrapper(cli) |
| _cleanup_wrapper(cli2) | ||
|
|
||
|
|
||
| class TestAsyncLargeResponse: |
There was a problem hiding this comment.
both tests build AsyncHostTransport directly, so if someone drops the arg in client.py or runtime.py the public API breaks silently while these still pass. worth one small test that builds AsyncSuperDocClient(env={'SUPERDOC_CLI_BIN': cli}, stdout_buffer_limit_bytes=64*1024) and checks the value reaches the transport.
AsyncHostTransport._reader_loop caught reader exceptions by rejecting pending futures and flipping state to DISCONNECTED, but never killed self._process. Because dispose() early-returns on DISCONNECTED, any reader-loop failure left an orphaned host subprocess running with no public API to reap it. This is a pre-existing bug, but the previous commit made it easier to trip by exposing stdout_buffer_limit_bytes: any caller who sets it below their real response size hits the orphan path. Route both the buffer-overflow and generic-error branches through a new _schedule_cleanup helper that fires _cleanup() as a separate task (it can't be awaited inline — _cleanup cancels and awaits the reader task itself). _cleanup kills the process, waits on it, rejects pending, and only then transitions to DISCONNECTED, so a subsequent dispose() is a safe no-op instead of leaking the host. Also catch asyncio.LimitOverrunError / ValueError separately and surface HOST_PROTOCOL_ERROR with a "raise stdout_buffer_limit_bytes" hint plus the current limit in details. The previous HOST_DISCONNECTED code pointed users at the wrong problem since the host was still alive. Extends TestAsyncLargeResponse to assert HOST_PROTOCOL_ERROR, verify the hint is in the message, confirm the subprocess is actually reaped (returncode set, _process cleared), and that dispose() after an overflow is a safe no-op.
Address review follow-ups on the async transport buffer-limit option. - Hoist DEFAULT_STDOUT_BUFFER_LIMIT_BYTES (64 MiB) to module scope in transport.py and reference it from AsyncHostTransport, the async runtime, and AsyncSuperDocClient so the default lives in one place instead of three copies of 64 * 1024 * 1024. - Add a short "raise if a single host response can exceed this size" comment on the client.py parameter so callers see the guidance at the public API boundary, not buried in transport.py. - Rename test_response_above_default_64kb_buffer to test_response_above_asyncio_default_streamreader_limit. 64 KiB is asyncio's default, not the SDK's (which is now 64 MiB), so the old name read backwards after this PR. - Add test_client_threads_stdout_buffer_limit_to_transport: builds AsyncSuperDocClient with a custom limit and asserts the value reaches AsyncHostTransport. Without this, a silent drop of the arg in client.py or runtime.py would leave the existing overflow test passing while the public API reverts to the asyncio 64 KiB default.
|
@caio-pizzol Thank you for the review! All issues were addressed. |
Summary
superdoc_get_contenton larger documents fails in the Python async SDK with a misleadingHOST_DISCONNECTEDerror. Root cause is a missinglimit=on the subprocess spawn — the stdoutStreamReaderinherits asyncio's 64 KiB default buffer, which any non-trivial response trips. This PR raises the buffer ceiling and exposes it as a configurable option.Problem
The Python async transport spawned the host CLI without passing a
limit=toasyncio.create_subprocess_exec, so its stdoutStreamReaderinherited asyncio's default 64 KiB buffer. Every host response is written as a single newline-delimited JSON line, so anycli.invokewhose serialized result exceeds 64 KiB (e.g.superdoc_get_contenton larger documents) causedreadline()to raiseValueError: Separator is not found, and chunk exceed the limitinside_reader_loop. The exception was caught by the generic reader-loop handler and pending requests were rejected with the misleadingHOST_DISCONNECTEDerror — even though the host process was still alive and healthy.Fix
Pass
limit=tocreate_subprocess_execand expose it as a newstdout_buffer_limit_bytesconstructor option onAsyncHostTransport, threaded throughSuperDocAsyncRuntimeandAsyncSuperDocClient. The default of 64 MiB safely covers the host's own 32 MiBDEFAULT_MAX_STDIN_BYTESinput cap with room for ~2x JSON expansion.Scope
SyncHostTransportis unaffected — it uses raw blockingsubprocess.Popenwhich has no asyncio buffer limit. No changes to the Node SDK or the host server.Tests
Adds a
TestAsyncLargeResponseregression suite that:stdout_buffer_limit_bytesstill reproduces the original failure mode, guaranteeing the option is wired through tocreate_subprocess_exec.Bug reproduction was verified by stashing the fix and running the new test against the unmodified
transport.py— it raised the exactSuperDocError: Host process disconnected.seen in production. With the fix in place, the full Python SDK test suite passes (90 tests, including 26 transport tests).Test plan
pytest packages/sdk/langs/python/tests/— 90 passedHOST_DISCONNECTEDfailure when the fix is revertedpnpm run generate:allclean (no codegen drift in this branch)