fix(h2): return Poll::Pending when poll_capacity is not ready in UpgradedSendStreamTask#4050
fix(h2): return Poll::Pending when poll_capacity is not ready in UpgradedSendStreamTask#4050abbshr wants to merge 2 commits intohyperium:masterfrom
Conversation
…adedSendStreamTask Fix a backpressure bypass bug in UpgradedSendStreamTask::tick() where poll_capacity() returning Poll::Pending caused a 'break 'capacity' that fell through to rx.poll_next() -> send_data(), pushing data into the h2 send buffer without available capacity. This broke the HTTP/2 flow control chain, causing unbounded memory growth (OOM) when downstream consumers were slower than upstream producers. The fix changes 'break 'capacity' to 'return Poll::Pending', which correctly suspends the task until a WINDOW_UPDATE frame restores send capacity. The now-unused 'capacity label is also removed. This bug was introduced in hyper v1.8.0 (PR hyperium#3967) and affects v1.8.0, v1.8.1, and v1.9.0. A single HTTP/2 CONNECT tunnel with asymmetric upstream/downstream speeds could trigger OOM within seconds. Add four integration tests covering H2 CONNECT backpressure scenarios: - h2_connect_backpressure_respected: small window + large data transfer - h2_connect_zero_window_then_release: normal path regression guard - h2_connect_reset_during_backpressure: RST_STREAM error propagation - h2_connect_backpressure_bidirectional: bidirectional data + backpressure
|
cc @seanmonstar |
src/proto/h2/upgrade.rs
Outdated
| ))); | ||
| } | ||
| Poll::Pending => break 'capacity, | ||
| Poll::Pending => return Poll::Pending, |
There was a problem hiding this comment.
The comment from L71-L74 I think is relevant to why this previously did not return early. We want to make sure the waker is registered with each of the futures, so that if one side "cancels", the task can clean up quickly.
- We want to notice when capacity has become available.
- Or when the remote has sent a RST_STREAM (or other error)
- Or when our bytes sender (on the
me.rxside) has closed and no longer expects to send more data.
Said another way, if we're waiting for capacity, and the user drops the Upgraded type (meaning they no longer want to write), this UpgradedSendStreamTask will not notice and will hang around until capacity is eventually given (if the peer ever gives it), and only then hang up.
I get what you're trying to do, but I think the types or channels would need to adjusted a little to handle those cases.
There was a problem hiding this comment.
Thanks for replying.
I re-read the comments and the code impls. As the comments said, there are three sub task within tick: h2_tx.poll_capacity() h2_tx.poll_reset(), rx.poll_next().
I agree with the case as you said: "when the remote has sent a RST_STREAM (or other error)" should be noticed as soon as possible, because it relate to h2 context semantic, if h2 chan no longer to work, the whole task should be dropped.
But rx.poll_next(), I think it's the other half of the whole transaction, and comes after the h2: if no write operations on h2 chan are permitted, no poll() should be performed on the rx chan.
So I think the modify should be something like this:
// check capacity
let h2_has_capacity = loop {
match me.h2_tx.poll_capacity(cx) {
...
// just break the loop return no capacity flag
Poll::Pending => break false,
}
}
// handle the h2_tx RST_STREAM case
match me.h2_tx.poll_reset(cx) {
....
}
if !h2_has_capacity {
return Poll::Pending;
}
// handle rx side poll data
match me.rx.as_mut().poll_next(cx) {
Poll::Ready(Some(cursor)) => {
me.h2_tx
.send_data(SendBuf::Cursor(cursor), false)
.map_err(crate::Error::new_body_write)?;
}
Poll::Ready(None) => {
me.h2_tx
.send_data(SendBuf::None, true)
.map_err(crate::Error::new_body_write)?;
return Poll::Ready(Ok(()));
}
Poll::Pending => {
return Poll::Pending;
}
}Correct me if I'm wrong with it.
When poll_capacity returns Pending, defer the return and check poll_reset first. This ensures the reset waker is registered via poll_reset (which shares the same send_task slot in h2 internally), enabling earlier RST_STREAM detection without an extra poll round-trip. While h2 internally calls notify_send() on RST_STREAM/EOF/error (so poll_capacity's waker alone would eventually be woken), polling poll_reset here provides immediate detection if RST_STREAM has already arrived, avoiding one unnecessary suspend/wake cycle. The h2_has_capacity flag cleanly separates the capacity check from the control flow, making the manual select-over-3-futures pattern more readable.
Fix #4049
Fix a backpressure bypass bug in UpgradedSendStreamTask::tick() where poll_capacity() returning Poll::Pending caused a 'break 'capacity' that fell through to rx.poll_next() -> send_data(), pushing data into the h2 send buffer without available capacity. This broke the HTTP/2 flow control chain, causing unbounded memory growth (OOM) when downstream consumers were slower than upstream producers.
The fix changes 'break 'capacity' to 'return Poll::Pending', which correctly suspends the task until a WINDOW_UPDATE frame restores send capacity. The now-unused 'capacity label is also removed.
This bug was introduced in hyper v1.8.0 (PR #3967) and affects v1.8.0, v1.8.1, and v1.9.0. A single HTTP/2 CONNECT tunnel with asymmetric upstream/downstream speeds could trigger OOM within seconds.
Add four integration tests covering H2 CONNECT backpressure scenarios: