Skip to content

feat(wasm): @moq/wasm as a drop-in for @moq/net; flip watch/publish/boy#1726

Draft
kixelated wants to merge 8 commits into
devfrom
claude/agitated-robinson-0a2511
Draft

feat(wasm): @moq/wasm as a drop-in for @moq/net; flip watch/publish/boy#1726
kixelated wants to merge 8 commits into
devfrom
claude/agitated-robinson-0a2511

Conversation

@kixelated

Copy link
Copy Markdown
Collaborator

Summary

Turns @moq/wasm (the wasm-bindgen build of moq-net) into a drop-in replacement for the hand-written TypeScript @moq/net, and flips the browser apps onto it. Goal: the moq-lite/IETF wire protocol lives in one place (Rust), not two.

rs/moq-wasm

  • Expanded from the consume-only spike to the full producer + consumer model: Session (connect / consume / publish), dual-use Broadcast (requested), TrackRequest, TrackProducer, TrackConsumer, TrackSubscriber, dual-use Group, and a real OriginConsumer with announce discovery + consume (no stub).
  • transport.rs advertises the moq ALPNs via web-transport-wasm 0.5.8's new ClientBuilder::with_protocols, so the relay negotiates lite-04/05 like @moq/net. Without it the browser sent no subprotocol and fell back to a dead lite-02 data path.
  • setup() caps tracing at WARN (default TRACE floods the console under announce churn).

js/wasm (TypeScript shim)

  • Connection (connect + Established + a ported Reload), model wrappers with the string/json/bool conveniences, options-object signatures, a reactive state.closed, synchronous lazy consume/subscribe, and number (not bigint) sequences.
  • Path / Time / Signals / Varint / TrackInfo re-exported from @moq/net (pure, wire-free helpers; tree-shaken).
  • Local http:// dev: connect fetches /certificate.sha256, pins it, and upgrades to https:// (mirrors @moq/net).
  • src never names dist/: imports the wasm-bindgen output via the #bindgen package-imports subpath. New vite plugin (js/common/vite-plugin-wasm) builds the wasm on demand and hot-rebuilds on Rust changes, so no manual just wasm is required.

Decouple serialization from the networking model

TypeScript treats @moq/net and @moq/wasm model classes as nominally incompatible (private fields), so apps can't mix them. Fix: keep the serialization layers backend-agnostic.

  • @moq/hang is serialization-only now: the container Consumer moved to js/watch, the Legacy.Producer to js/publish; the pure Format / encodeFrame stay in hang. Dropped hang's Net/Moq re-exports.
  • @moq/json and @moq/msf take minimal structural track interfaces instead of concrete @moq/net classes, so any backend (or an in-process test double) satisfies them. @moq/net is now a devDependency there.

Flip the apps

js/watch, js/publish, and js/moq-boy app code now import @moq/wasm instead of @moq/net (their in-process unit tests still use @moq/net).

Test plan

  • cargo check --workspace (moq-wasm is empty off-wasm); cargo build -p moq-wasm --target wasm32
  • just js check green across all packages (tsc + biome), incl. flipped watch/publish/boy
  • bun test green: watch 80 (incl. moved Consumer tests), hang 18 (incl. moved Format tests), json 32, publish 2
  • just dev from a clean tree: vite plugin auto-builds the wasm, no manual step, no resolve error
  • In-browser via @moq/wasm: connect (http→https + cert pin), version negotiation, announce discovery, subscribe
  • Known issue / not done: consuming a media track hard-freezes the renderer (a synchronous loop in the wasm consume path), independent of the relay. The on-load freeze was a separate cluster relay's flapping .stats announces and is gone with a clean relay. Investigating the media-consume freeze is the follow-up.

Branch targeting

dev per the wire/breaking-change rules (wasm wire path + breaking JS API changes across @moq/hang, @moq/json, @moq/msf, watch/publish/boy).

(Written by Claude)

…publish/boy

Expand the rs/moq-wasm bindings into a full consume + publish object model and
add a hand-written TypeScript shim so @moq/wasm presents the same surface as
@moq/net, then flip the apps to use it.

rs/moq-wasm:
- Full producer + consumer model: Session (connect/consume/publish), dual-use
  Broadcast (requested), TrackRequest, TrackProducer, TrackConsumer,
  TrackSubscriber, dual-use Group, plus a real OriginConsumer with announce
  discovery + consume (no stub).
- transport.rs advertises the moq ALPNs via web-transport-wasm 0.5.8's new
  ClientBuilder::with_protocols so the relay negotiates lite-04/05 like @moq/net
  (without it the browser sent no subprotocol and fell back to a dead lite-02).
- setup() caps tracing at WARN (default TRACE floods the console under churn).

js/wasm (TS shim):
- Connection (connect + Established + ported Reload), model wrappers with the
  string/json/bool conveniences, options-object signatures, reactive
  state.closed, synchronous lazy consume/subscribe, number sequences.
- Path/Time/Signals/Varint/TrackInfo re-exported from @moq/net (pure helpers).
- For local http:// dev, connect fetches /certificate.sha256, pins it, and
  upgrades to https:// (mirrors @moq/net).
- src imports the wasm-bindgen output via the "#bindgen" package-imports subpath
  (never names dist/); a new vite plugin (js/common/vite-plugin-wasm) builds the
  wasm on demand and hot-rebuilds on Rust changes, so no manual `just wasm`.

Decouple the serialization layers from the networking model so the apps can flip
backends without the TS private-field nominal-typing wall:
- @moq/hang is serialization-only now: the container Consumer moved to
  js/watch, the Legacy Producer to js/publish; the pure Format/encodeFrame stay.
- @moq/json and @moq/msf take minimal structural track interfaces instead of
  concrete @moq/net classes, so any backend (or a test double) satisfies them.
  @moq/net is now a devDependency there.

Flip js/watch, js/publish, and js/moq-boy app code from @moq/net to @moq/wasm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated

Copy link
Copy Markdown
Collaborator Author

Root cause of the media-consume freeze (investigated)

Traced the renderer freeze by instrumenting the wasm consume path and driving it directly in the browser against a clean standalone relay (lite-05-wip, bbb.hang fmp4).

It's main-thread event-loop starvation in the group read loop, not a logic bug.

  • recvGroup works; the catalog track (no timescale) reads fine.
  • For the video track (has a timescale → lite-05 per-frame timestamps), the spawned run_group task in rs/moq-net/src/lite/subscriber.rs drains the group's frame stream in a tight loop: instrumentation showed it reading hundreds of frames (size=6755, 11630, 20403, 25106, ..., sequential 24fps timestamps) in a single instant, never yielding.
  • Over a local relay with a buffered backlog, every stream.decode_maybe().await resolves synchronously from the reader's buffer — a microtask, not a real network wait. The browser drains all microtasks before any macrotask/render, so run_group starves the event loop (confirmed: setTimeout never fires; CDP eval times out). On native (tokio) the same loop is just a fast background task with nothing to starve.
  • This is independent of consumer pacing: a probe that read a single frame and stopped still froze, because the background run_group keeps draining.

The earlier on-load freeze was a separate issue: the actionstreamer cluster relay's flapping .stats/.../leaf0|leaf1 announces. Gone with a clean standalone relay (single non-flapping .stats/local/host).

Fix direction: run the wasm consume loop off the main thread (WebWorker). A naive comlink proxy-per-object would be too chatty (a postMessage round-trip per frame); the consume (and ideally decode) loop should live in the worker and post frames/VideoFrames to the main thread in bulk. A smaller interim band-aid is a cooperative macrotask yield in the wasm read loop so it stops starving the event loop. Tracking as the follow-up to this PR.

(Written by Claude)

…'t freeze the browser

On wasm the subscriber runs on the browser's single thread. A relay sends its
whole cache backlog on subscribe, and over a local WebTransport every stream
read resolves synchronously (a microtask), so `run_group` drained frames back to
back without ever yielding to a macrotask, starving the event loop (no render,
no timers, setTimeout never fires) and freezing the page.

Add a `#[cfg(target_arch = "wasm32")]` `web_async::time::sleep(ZERO).await` per
frame so the page stays responsive. No-op on native, where this is a background
task with nothing to starve (verified: a native moq-cli subscriber reads the
same broadcast fine, 246 groups of <=60 frames, so this is browser-thread
starvation, not a grouping/loop bug).

A WebWorker would be the proper fix (true off-main-thread + no per-frame yield
throttle); this unblocks the wasm consume path in the meantime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated

Copy link
Copy Markdown
Collaborator Author

Freeze fixed (e86a61a) + ruled out a Rust bug

Followed up on the media-consume freeze.

Ruled out a logic/grouping bug via a native control. Ran moq-cli subscribe --name bbb.hang against the same relay with per-frame logging + a 300-frame/group guard: 246 groups, max 60 frames each, groups advancing normally, guard never fired. So run_group / GroupConsumer are correct — not a runaway/non-advancing group.

It's wasm main-thread starvation. On wasm the subscriber runs on the browser's single thread. The relay sends its whole cache backlog on subscribe, and over a local WebTransport every stream read resolves synchronously (a microtask), so run_group drained frames back-to-back without ever yielding to a macrotask — starving the event loop (setTimeout never fires, the page freezes). Native (tokio) just absorbs the same as a fast background task.

Fix: a #[cfg(target_arch = "wasm32")] web_async::time::sleep(ZERO).await per frame in run_group so the page stays responsive. No-op on native. Verified in-browser: the watch demo now loads alive (was a hard freeze before), and a direct consume probe keeps the renderer responsive while reading frames.

Caveat / follow-up: per-frame setTimeout(0) is clamped (~250fps), so it throttles raw backlog drain (fine for playback since the watch latency buffer skips old groups). The proper fix is to run the consume (+decode) loop in a WebWorker — true off-main-thread, no per-frame yield, and frees the main thread for render. Tracking that as the next step.

(Written by Claude)

kixelated and others added 2 commits June 14, 2026 11:23
Logs every wasm consume-API call (consume/announced.next/subscribe/recvGroup/
nextGroup/readFrame) with an `await...` before and a `-> result` after, target
"wasm", at WARN so it shows with the default setup() tracing level.

Use it to see where the consume path stalls: the last `await...` line without a
matching `->` is the stuck call. Revert before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Swap the WARN-capped tracing-wasm for tracing-web plus a tracing-subscriber
Targets filter, so setup() takes a RUST_LOG-style directive and routes
moq_net's internal spans/events to the browser console. The boundary logs
only show the shim surface; this exposes the consume path inside moq_net
(lite::subscriber has the frame loop) to find where it stalls.

setup(filter) parses e.g. "warn,moq_net::lite=trace,wasm=trace" (Targets, not
EnvFilter, to avoid the regex/env-filter bloat on wasm). @moq/wasm's init()
reads localStorage.moq_log, so you can crank a target up from the browser
console and reload without a rebuild. Defaults to "warn".

Temporary debug aid alongside the boundary logging; revert before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onsumer

GroupConsumer::poll_read_frame / poll_read_frame_chunks called poll_get_frame ->
frame.consume() on every poll, then dropped that FrameConsumer whenever the
frame's data wasn't complete yet (still in flight). A FrameConsumer is a kio
consumer handle, so that create+drop flips the frame's consumer count 0->1->0
each poll, and kio wakes the state's waiters on both the first-appears and
last-drops transitions -- the same waiters our own read registered on. Every
poll re-woke itself: a silent busy spin.

On a multi-threaded runtime the producer fills the frame concurrently so the
spin ends in microseconds (wasted CPU, no visible hang). On a single-thread
executor (wasm) the consumer's self-wake loop starves the producer, so the frame
never completes and the spin runs away into a hard freeze (~22M re-polls / ~45M
wakes on one frame).

Read the frame in place instead of through a consumer handle:
- kio: add `Producer::poll_ref`, a read-only counterpart to `Producer::poll`
  that registers a waiter on a read condition without taking a `Mut` (no
  modified flag, no consumer-count churn).
- model/frame: `FrameProducer::poll_read_all` reads the producer's own buffer
  once finished, via poll_ref. Stateless (always offset 0), so parallel readers
  are fine.
- model/group: `GroupState::poll_frame_read_all` reads the cached FrameProducer
  directly; poll_read_frame / poll_read_frame_chunks use it and no longer mint a
  FrameConsumer. GroupConsumer stays a plain derive(Clone) with no extra state.

Also drop the per-frame web_async::time::sleep(ZERO) yield in run_group: it
didn't address this freeze (the spin is consumer-side) and spamming wasmtimer's
shared global timer driver is itself a hazard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated kixelated force-pushed the claude/agitated-robinson-0a2511 branch from de39d59 to c4992f2 Compare June 14, 2026 21:15
kixelated and others added 3 commits June 14, 2026 15:55
…on-0a2511

# Conflicts:
#	bun.lock
#	demo/web/vite.config.ts
#	js/hang/src/container/legacy.ts
#	js/hang/src/index.ts
#	js/msf/src/catalog.ts
#	js/watch/src/container.ts
#	rs/kio/src/producer.rs
#	rs/moq-wasm/src/lib.rs
…sumer waiters (#1739)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated

Copy link
Copy Markdown
Collaborator Author

Update: rebased on dev + freeze resolved via the kio fixes

The media-consume hard-freeze (the known issue) is fixed, and the branch is now current with origin/dev.

Root cause (debugged in-browser): GroupConsumer::poll_read_frame minted a FrameConsumer (frame.consume()) on every poll and dropped it while the frame was still in flight. Each create/drop flipped the frame's consumer count 0↔1, and kio woke the value waiters on those transitions — so each poll re-woke its own read: a silent busy spin that pegged the single wasm thread (~22M re-polls / ~45M wakes on one frame). Multi-threaded native fills the frame concurrently so it never showed there.

Fix is in kio, not moq-net — so the original frame.consume()-per-poll code is unchanged:

This branch carries #1739 (cherry-picked) since dev doesn't have it yet — it'll dedupe on the next maindev merge. My earlier moq-net workaround (read-in-place) is dropped.

Also in this merge:

  • Adapted moq-wasm to dev's builder Session API (Client::new().with_subscriber(…).with_publish(…), holding the subscribe OriginConsumer + publish OriginProducer).
  • Added CacheFull to @moq/wasm + had the read wrappers re-throw it (dev's new MSE path catches Moq.CacheFull to resync).
  • Reconciled the hang-decouple with dev's JSDoc/JSR additions.

Verified: cargo check --workspace, just js check (tsc + biome), cargo test -p kio, just wasm all green; in-browser the page stays responsive while consuming the video track (was an instant freeze before).

Before merge: revert the temporary debug commits (84ef29b6 boundary logging, and the wasm:-target logs); the setup() RUST_LOG/tracing-web wiring is arguably worth keeping.

(Written by Claude)

@kixelated kixelated marked this pull request as draft June 16, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant