Skip to content

fix(moq-net): read frames in place, without minting a per-poll FrameConsumer#1736

Closed
kixelated wants to merge 1 commit into
mainfrom
fix-group-consumer-wake-spin
Closed

fix(moq-net): read frames in place, without minting a per-poll FrameConsumer#1736
kixelated wants to merge 1 commit into
mainfrom
fix-group-consumer-wake-spin

Conversation

@kixelated

@kixelated kixelated commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

GroupConsumer::poll_read_frame / poll_read_frame_chunks called poll_get_frameframe.consume() on every poll, then dropped that FrameConsumer whenever the frame's data wasn't complete yet (poll_read_all still Pending). A FrameConsumer is a kio consumer handle, so that create+drop flips the frame's consumer count 0→1→0 each poll, and kio wakes the state's waiters on both the "first consumer appears" and "last consumer drops" transitions — the same waiters the read itself registered on. So every poll re-woke itself — a silent busy spin.

  • On a multi-threaded runtime the producer fills the frame concurrently, so the spin ends in microseconds: wasted CPU, no visible hang. This is why it's been latent.
  • On a single-thread executor (e.g. the wasm/@moq/net-over-WASM consume path on dev) the consumer's self-wake loop starves the producer, so the frame never completes and the spin runs away into a hard browser freeze (~22M re-polls / ~45M wakes on one frame).

Fix

Read the frame in place instead of through a consumer handle:

  • kio: add Producer::poll_ref — a read-only counterpart to Producer::poll that registers a waiter on a read condition without taking a Mut (no modified flag, no consumer-count side effects).
  • model/frame: FrameProducer::poll_read_all reads the producer's own buffer once the frame is finished, via poll_ref. Stateless (always offset 0), so any number of parallel readers are fine.
  • model/group: GroupState::poll_frame_read_all reads the cached FrameProducer directly; poll_read_frame / poll_read_frame_chunks use it and no longer mint a FrameConsumer. GroupConsumer stays a plain derive(Clone) with no extra state.

Public API

GroupConsumer unchanged. One additive kio method (Producer::poll_ref). FrameProducer::poll_read_all is pub(crate).

Note: the kio Producer::poll_ref addition is additive; coordinating with separate in-flight kio work on the Mut/wake semantics. That work is complementary — this PR removes the consumer-count churn regardless.

Test plan

  • cargo test -p moq-net --lib — 346 pass; cargo test -p kio
  • cargo clippy -p moq-net -p kio clean, cargo fmt
  • Verified the freeze is gone end-to-end in a real browser via the wasm consume path (heartbeat kept ticking while a probe drained hundreds of frames and completed cleanly — vs. an instant freeze on frame 1 before). The dev wasm PR (feat(wasm): @moq/wasm as a drop-in for @moq/net; flip watch/publish/boy #1726) carries the same fix and is where the freeze was first observed.

🤖 Generated with Claude Code

(Written by Claude)

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 086aabaf-e780-4206-acad-fa9b4009f6d8

📥 Commits

Reviewing files that changed from the base of the PR and between 5aee99f and ea13e11.

📒 Files selected for processing (3)
  • rs/kio/src/producer.rs
  • rs/moq-net/src/model/frame.rs
  • rs/moq-net/src/model/group.rs

Walkthrough

This PR shifts frame reading from allocating and caching FrameConsumer handles to direct in-place polling of frame payloads. A new Producer::poll_ref method enables read-only, async-safe polling with waker registration that does not notify consumers. FrameProducer::poll_read_all uses this method to poll the producer's frame state directly and return the complete payload as zero-copy Bytes once the frame finishes writing. GroupState::poll_frame_read_all provides a group-level wrapper around frame polling. GroupConsumer::poll_read_frame and poll_read_frame_chunks are rewritten to call the new group helper instead of maintaining and re-polling a cached FrameConsumer, eliminating per-poll handle allocation and state management overhead.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: reading frames in place without creating temporary FrameConsumer objects on every poll.
Description check ✅ Passed The description is thorough and directly related to the changeset, explaining the problem, the fix, and the testing performed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix-group-consumer-wake-spin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rs/moq-net/src/model/group.rs`:
- Around line 370-383: The issue is that poll_read_frame caches a frame in
self.frame while poll_next_frame ignores this cached state, causing frame
duplication and skipping when APIs are mixed. Fix this by having poll_next_frame
check and return self.frame.take() before resolving a fresh frame, ensuring the
same frame is not processed twice. Additionally, add a regression test that
verifies the Pending to next_frame to complete flow works correctly and doesn't
duplicate or skip frames.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c19cd729-d04a-4426-aa6d-4ef246989e61

📥 Commits

Reviewing files that changed from the base of the PR and between 15e2e88 and 5aee99f.

📒 Files selected for processing (1)
  • rs/moq-net/src/model/group.rs

Comment on lines 370 to 383
pub fn poll_read_frame(&mut self, waiter: &kio::Waiter) -> Poll<Result<Option<Bytes>>> {
let Some(mut frame) = ready!(self.poll(waiter, |state| state.poll_get_frame(self.index))?) else {
return Poll::Ready(Ok(None));
};
if self.frame.is_none() {
let index = self.index;
let Some(frame) = ready!(self.poll(waiter, |state| state.poll_get_frame(index))?) else {
return Poll::Ready(Ok(None));
};
self.frame = Some(frame);
}

let data = ready!(frame.poll_read_all(waiter))?;
let data = ready!(self.frame.as_mut().unwrap().poll_read_all(waiter))?;
self.frame = None;
self.index += 1;

Poll::Ready(Ok(Some(data)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Coordinate next_frame with the cached in-flight frame.

Persisting self.frame here makes GroupConsumer stateful across polls, but poll_next_frame still ignores that state. A reachable sequence is: poll_read_frame* caches frame N and returns Pending, next_frame then resolves frame N again and bumps index, and when the cached read finally completes it bumps index a second time. That duplicates frame N and skips frame N + 1.

Either have poll_next_frame return self.frame.take() before resolving a fresh frame, or explicitly reject switching APIs while frame.is_some(). Please add a regression test for the Pending -> next_frame -> complete path with the fix.

Suggested direction
 pub fn poll_next_frame(&mut self, waiter: &kio::Waiter) -> Poll<Result<Option<FrameConsumer>>> {
-	let Some(frame) = ready!(self.poll(waiter, |state| state.poll_get_frame(self.index))?) else {
-		return Poll::Ready(Ok(None));
-	};
+	let frame = if let Some(frame) = self.frame.take() {
+		frame
+	} else {
+		let Some(frame) = ready!(self.poll(waiter, |state| state.poll_get_frame(self.index))?) else {
+			return Poll::Ready(Ok(None));
+		};
+		frame
+	};
 
 	self.index += 1;
 	Poll::Ready(Ok(Some(frame)))
 }

Also applies to: 392-405

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-net/src/model/group.rs` around lines 370 - 383, The issue is that
poll_read_frame caches a frame in self.frame while poll_next_frame ignores this
cached state, causing frame duplication and skipping when APIs are mixed. Fix
this by having poll_next_frame check and return self.frame.take() before
resolving a fresh frame, ensuring the same frame is not processed twice.
Additionally, add a regression test that verifies the Pending to next_frame to
complete flow works correctly and doesn't duplicate or skip frames.

…onsumer

GroupConsumer::poll_read_frame / poll_read_frame_chunks called poll_get_frame ->
frame.consume() on every poll, then dropped that FrameConsumer whenever the
frame's data wasn't complete yet (still in flight). A FrameConsumer is a kio
consumer handle, so that create+drop flips the frame's consumer count 0->1->0
each poll, and kio wakes the state's waiters on both the first-appears and
last-drops transitions -- the same waiters our own read registered on. Every
poll re-woke itself: a silent busy spin.

On a multi-threaded runtime the producer fills the frame concurrently so the
spin ends in microseconds (wasted CPU, no visible hang). On a single-thread
executor (e.g. wasm) the consumer's self-wake loop starves the producer, so the
frame never completes and the spin runs away into a hard freeze.

Read the frame in place instead of through a consumer handle:
- kio: add `Producer::poll_ref`, a read-only counterpart to `Producer::poll`
  that registers a waiter on a read condition without taking a `Mut` (no
  modified flag, no consumer-count churn).
- model/frame: `FrameProducer::poll_read_all` reads the producer's own buffer
  once finished, via poll_ref. Stateless (always offset 0), so parallel readers
  are fine.
- model/group: `GroupState::poll_frame_read_all` reads the cached FrameProducer
  directly; poll_read_frame / poll_read_frame_chunks use it and no longer mint a
  FrameConsumer. GroupConsumer stays a plain derive(Clone) with no extra state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated kixelated force-pushed the fix-group-consumer-wake-spin branch from 5aee99f to ea13e11 Compare June 14, 2026 21:17
@kixelated kixelated changed the title fix(moq-net): hold the frame consumer across read_frame polls (no wake-spin) fix(moq-net): read frames in place, without minting a per-poll FrameConsumer Jun 14, 2026
@kixelated

Copy link
Copy Markdown
Collaborator Author

Superseded by the kio fixes #1735 (read-only Producer::poll predicate) and #1739 (split State.waiters so consume()/Consumer::drop no longer wake value readers). Those fix the consumer-count-churn spin at the source, so the original frame.consume()-per-poll code no longer freezes — verified in-browser. No moq-net workaround needed.

(Written by Claude)

@kixelated kixelated closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant