Skip to content

fix(sdk): raise asyncio StreamReader buffer in Python AsyncHostTransport#2760

Open
michaelreavant wants to merge 4 commits intosuperdoc-dev:mainfrom
michaelreavant:fix/python-sdk-async-transport-large-response-buffer
Open

fix(sdk): raise asyncio StreamReader buffer in Python AsyncHostTransport#2760
michaelreavant wants to merge 4 commits intosuperdoc-dev:mainfrom
michaelreavant:fix/python-sdk-async-transport-large-response-buffer

Conversation

@michaelreavant
Copy link
Copy Markdown

Summary

superdoc_get_content on larger documents fails in the Python async SDK with a misleading HOST_DISCONNECTED error. Root cause is a missing limit= on the subprocess spawn — the stdout StreamReader inherits asyncio's 64 KiB default buffer, which any non-trivial response trips. This PR raises the buffer ceiling and exposes it as a configurable option.

Problem

The Python async transport spawned the host CLI without passing a limit= to asyncio.create_subprocess_exec, so its stdout StreamReader inherited asyncio's default 64 KiB buffer. Every host response is written as a single newline-delimited JSON line, so any cli.invoke whose serialized result exceeds 64 KiB (e.g. superdoc_get_content on larger documents) caused readline() to raise ValueError: Separator is not found, and chunk exceed the limit inside _reader_loop. The exception was caught by the generic reader-loop handler and pending requests were rejected with the misleading HOST_DISCONNECTED error — even though the host process was still alive and healthy.

Fix

Pass limit= to create_subprocess_exec and expose it as a new stdout_buffer_limit_bytes constructor option on AsyncHostTransport, threaded through SuperDocAsyncRuntime and AsyncSuperDocClient. The default of 64 MiB safely covers the host's own 32 MiB DEFAULT_MAX_STDIN_BYTES input cap with room for ~2x JSON expansion.

Scope

SyncHostTransport is unaffected — it uses raw blocking subprocess.Popen which has no asyncio buffer limit. No changes to the Node SDK or the host server.

Tests

Adds a TestAsyncLargeResponse regression suite that:

  1. Round-trips a 200 KB response through the default-configured transport.
  2. Pins that an explicitly tightened stdout_buffer_limit_bytes still reproduces the original failure mode, guaranteeing the option is wired through to create_subprocess_exec.

Bug reproduction was verified by stashing the fix and running the new test against the unmodified transport.py — it raised the exact SuperDocError: Host process disconnected. seen in production. With the fix in place, the full Python SDK test suite passes (90 tests, including 26 transport tests).

Test plan

  • pytest packages/sdk/langs/python/tests/ — 90 passed
  • Verified regression test reproduces the original HOST_DISCONNECTED failure when the fix is reverted
  • pnpm run generate:all clean (no codegen drift in this branch)

The Python async transport spawned the host CLI without passing a `limit=`
to `asyncio.create_subprocess_exec`, so its stdout `StreamReader` inherited
asyncio's default 64 KiB buffer. Every host response is written as a single
newline-delimited JSON line, so any `cli.invoke` whose serialized result
exceeds 64 KiB (e.g. `superdoc_get_content` on larger documents) caused
`readline()` to raise `ValueError: Separator is not found, and chunk
exceed the limit` inside `_reader_loop`. The exception was caught by the
generic reader-loop handler and pending requests were rejected with the
misleading `HOST_DISCONNECTED` error — even though the host process was
still alive and healthy.

Pass `limit=` to `create_subprocess_exec` and expose it as a new
`stdout_buffer_limit_bytes` constructor option on `AsyncHostTransport`,
threaded through `SuperDocAsyncRuntime` and `AsyncSuperDocClient`. The
default of 64 MiB safely covers the host's own 32 MiB
`DEFAULT_MAX_STDIN_BYTES` input cap with room for ~2x JSON expansion.

`SyncHostTransport` is unaffected — it uses raw blocking `subprocess.Popen`
which has no asyncio buffer limit.

Adds a `TestAsyncLargeResponse` regression suite that:
  1. Round-trips a 200 KB response through the default-configured transport.
  2. Pins that an explicitly tightened `stdout_buffer_limit_bytes` still
     reproduces the original failure mode, guaranteeing the option is
     wired through to `create_subprocess_exec`.
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelreavant — fix is right, flagging a few things before merge.

the main one: when the buffer limit is hit, the error handler at transport.py:619-631 marks the transport as disconnected but doesn't kill the host process. a later dispose() sees the state as already disconnected and returns early, so the host keeps running. your PR raises the ceiling (64 MiB vs 64 KiB) but doesn't close this gap — any caller who sets stdout_buffer_limit_bytes below their real response size will leave host processes running in the background.

fix: in the except Exception branch, kill the process and clear self._process before flipping state. the cleanest way is to schedule a _cleanup task from that branch.

while you're there: the error code on that path is still HOST_DISCONNECTED, which points people at the wrong problem since the host didn't actually die. catching ValueError / asyncio.LimitOverrunError separately and surfacing HOST_PROTOCOL_ERROR with a "raise stdout_buffer_limit_bytes" hint would be much more actionable.

smaller notes inline.

stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.DEVNULL,
env={**os.environ, **self._env},
limit=self._stdout_buffer_limit_bytes,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exposes the bug down at transport.py:619-631 — see the summary. the fix belongs in that branch, not here.

request_timeout_ms: Optional[int] = None,
watchdog_timeout_ms: int = 30_000,
max_queue_depth: int = 100,
stdout_buffer_limit_bytes: int = 64 * 1024 * 1024,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 * 1024 * 1024 also lives in runtime.py:82 and client.py:343. hoist a module-level constant so tuning is one place:

Suggested change
stdout_buffer_limit_bytes: int = 64 * 1024 * 1024,
stdout_buffer_limit_bytes: int = _DEFAULT_STDOUT_BUFFER_LIMIT_BYTES,

(and define _DEFAULT_STDOUT_BUFFER_LIMIT_BYTES = 64 * 1024 * 1024 at module scope, then reference it from the three signatures)

request_timeout_ms: int | None = None,
watchdog_timeout_ms: int = 30_000,
max_queue_depth: int = 100,
stdout_buffer_limit_bytes: int = 64 * 1024 * 1024,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short note on when to raise this — today the reason only lives in a comment in transport.py.

Suggested change
stdout_buffer_limit_bytes: int = 64 * 1024 * 1024,
stdout_buffer_limit_bytes: int = 64 * 1024 * 1024, # raise if responses exceed 64 MiB

"""Responses larger than the StreamReader buffer must not crash the reader."""

@pytest.mark.asyncio
async def test_response_above_default_64kb_buffer(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64kb refers to the old default — after this PR it's 64 MiB, so the name reads backwards.

Suggested change
async def test_response_above_default_64kb_buffer(self):
async def test_response_above_asyncio_default_streamreader_limit(self):

Comment on lines +527 to +538
try:
transport = AsyncHostTransport(
cli,
startup_timeout_ms=5_000,
stdout_buffer_limit_bytes=64 * 1024,
)
await transport.connect()
with pytest.raises(SuperDocError) as exc_info:
await transport.invoke(_TEST_OP, {'query': 'big'})
assert exc_info.value.code == HOST_DISCONNECTED
finally:
_cleanup_wrapper(cli)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no await transport.dispose() on this path, so the host is left running after the test. wrap in try/finally like the sibling tests:

Suggested change
try:
transport = AsyncHostTransport(
cli,
startup_timeout_ms=5_000,
stdout_buffer_limit_bytes=64 * 1024,
)
await transport.connect()
with pytest.raises(SuperDocError) as exc_info:
await transport.invoke(_TEST_OP, {'query': 'big'})
assert exc_info.value.code == HOST_DISCONNECTED
finally:
_cleanup_wrapper(cli)
try:
transport = AsyncHostTransport(
cli,
startup_timeout_ms=5_000,
stdout_buffer_limit_bytes=64 * 1024,
)
await transport.connect()
with pytest.raises(SuperDocError) as exc_info:
await transport.invoke(_TEST_OP, {'query': 'big'})
assert exc_info.value.code == HOST_DISCONNECTED
await transport.dispose()
finally:
_cleanup_wrapper(cli)

_cleanup_wrapper(cli2)


class TestAsyncLargeResponse:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both tests build AsyncHostTransport directly, so if someone drops the arg in client.py or runtime.py the public API breaks silently while these still pass. worth one small test that builds AsyncSuperDocClient(env={'SUPERDOC_CLI_BIN': cli}, stdout_buffer_limit_bytes=64*1024) and checks the value reaches the transport.

caio-pizzol and others added 3 commits April 13, 2026 22:57
AsyncHostTransport._reader_loop caught reader exceptions by rejecting
pending futures and flipping state to DISCONNECTED, but never killed
self._process. Because dispose() early-returns on DISCONNECTED, any
reader-loop failure left an orphaned host subprocess running with no public API to reap it. This is a pre-existing bug, but the previous
commit made it easier to trip by exposing stdout_buffer_limit_bytes:
any caller who sets it below their real response size hits the orphan path.

Route both the buffer-overflow and generic-error branches through a
new _schedule_cleanup helper that fires _cleanup() as a separate task
(it can't be awaited inline — _cleanup cancels and awaits the reader
task itself). _cleanup kills the process, waits on it, rejects pending,
and only then transitions to DISCONNECTED, so a subsequent dispose()
is a safe no-op instead of leaking the host.

Also catch asyncio.LimitOverrunError / ValueError separately and
surface HOST_PROTOCOL_ERROR with a "raise stdout_buffer_limit_bytes"
hint plus the current limit in details. The previous HOST_DISCONNECTED
code pointed users at the wrong problem since the host was still alive.

Extends TestAsyncLargeResponse to assert HOST_PROTOCOL_ERROR, verify
the hint is in the message, confirm the subprocess is actually reaped
(returncode set, _process cleared), and that dispose() after an overflow is a safe no-op.
Address review follow-ups on the async transport buffer-limit option.

- Hoist DEFAULT_STDOUT_BUFFER_LIMIT_BYTES (64 MiB) to module scope in
  transport.py and reference it from AsyncHostTransport, the async
  runtime, and AsyncSuperDocClient so the default lives in one place
  instead of three copies of 64 * 1024 * 1024.

- Add a short "raise if a single host response can exceed this size"
  comment on the client.py parameter so callers see the guidance at
  the public API boundary, not buried in transport.py.

- Rename test_response_above_default_64kb_buffer to
  test_response_above_asyncio_default_streamreader_limit. 64 KiB is
  asyncio's default, not the SDK's (which is now 64 MiB), so the old
  name read backwards after this PR.

- Add test_client_threads_stdout_buffer_limit_to_transport: builds
  AsyncSuperDocClient with a custom limit and asserts the value
  reaches AsyncHostTransport. Without this, a silent drop of the arg
  in client.py or runtime.py would leave the existing overflow test
  passing while the public API reverts to the asyncio 64 KiB default.
@michaelreavant
Copy link
Copy Markdown
Author

@caio-pizzol Thank you for the review! All issues were addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants