HDDS-15551. Stream reads should respect gRPC flow control backpressure#10522
HDDS-15551. Stream reads should respect gRPC flow control backpressure#10522ss77892 wants to merge 3 commits into
Conversation
| return responseQueue.isEmpty() ? null : readFromQueue(); | ||
| } | ||
|
|
||
| refreshReadDeadline(); |
There was a problem hiding this comment.
This seems not needed since readBlock -> readBlockImpl -> setReadDeadlineNs.
There was a problem hiding this comment.
readBlock() calls readBlockImpl() (which sets the deadline) only when readLength > 0. When a read is satisfied from already-requested data (readLength <= 0), readBlockImpl() is skipped, so poll() would otherwise wait against a stale deadline from a previous read and could time out immediately.
TestStreamBlockInputStream#testReadWithoutNewRequestGetsFreshTimeoutBudget covers this path.
There was a problem hiding this comment.
In the original code, poll() initialize the startTime in the beginning. The original code does not have the bug described. No?
There was a problem hiding this comment.
Yeah, in the original code, startTime was initialized locally. So, a stale deadline wasn't possible there. In this PR the model has been changed. Per-poll local budget has been replaced with a shared round-trip deadline used by both streamRead() and poll(). Once the deadline is stated, it has to be 're-armed' when a new round starts. readBlockImpl() is responsible for that, but it might be skipped when readLength is 0, and as a result, poll() would fail with a timeout even though there is no new request and the server is healthy.
There was a problem hiding this comment.
Per-poll local budget has been replaced with a shared round-trip deadline
But what is the advantage for such change? It really makes the code much harder to understand.
There was a problem hiding this comment.
We might spend some time waiting for gRPC flow control's isReady(), and then spend time waiting for the read response. Together, they might exceed the read timeout, resulting in a single read call blocking longer than the configured timeout before failing or retrying.
There was a problem hiding this comment.
Support the timeout is 5s.
- Wait for isReady(): 2s
- Wait for poll() is: 4s
Do you mean that it should fail in this case?
There was a problem hiding this comment.
That was the original intent, but during testing with hbase I hit a number of regressions, and the logic changed several times. Ended up with a single budget that refreshes on conditions. You are right, we might simply separate those.
…plify wait Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gets for gRPC isReady()
What changes were proposed in this pull request?
When the Ozone client performs a streaming block read, XceiverClientGrpc.streamRead() previously called onNext() on the gRPC request observer unconditionally, without checking whether the underlying gRPC stream was ready to accept more data. Under load, this ignores gRPC's flow control: when the transport buffer is full, isReady() returns false, and writing anyway causes unbounded buffering on the client side instead of applying backpressure.
New gRPC flow control backpressure:
• Before sending a request, streamRead() now checks ClientCallStreamObserver.isReady(). If the stream is not ready, it parks the calling thread (10ms intervals via LockSupport.parkNanos) and waits for the stream to become ready, only then calling onNext().
• The wait is bounded by a deadline. If the caller supplies a read deadline (StreamingReadResponse.getReadDeadlineNs()), that deadline is honored; otherwise it falls back to the client-level read timeout. A TimeoutIOException is thrown if the stream never becomes ready within the deadline.
• The wait is interrupt-aware: on interruption it throws InterruptedIOException and restores the thread's interrupt flag.
• A new readDeadlineNs field is added to StreamingReadResponse. StreamBlockInputStream sets it before/after streamRead() so a single deadline bounds both the isReady() flow-control wait and the subsequent response wait in poll(). The poll() timeout logic is switched from a start-time/elapsed computation to this shared deadline, and the read path refreshes the deadline before each readBlock() so time spent waiting on flow control does not eat into the response-wait budget.
• XceiverClientSpi.streamRead() and the overridden XceiverClientGrpc.streamRead() signatures now declare throws IOException to surface the timeout/interrupt conditions.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15551
How was this patch tested?
• new UTs in TestXceiverClientGrpc and TestStreamBlockInputStream
• HBase cluster with HFiles stored on ozone FS.