perf(distributed): skip eager probe session when chunkWorkerCount > 1 by jrusso1020 · Pull Request #916 · heygen-com/hyperframes

jrusso1020 · 2026-05-17T07:54:45Z

What

When renderChunk resolves chunkWorkerCount > 1, skip the eager
createCaptureSession + assertSwiftShader + initializeSession pre-warmup
that captureStage's parallel branch immediately closes anyway. Move the
SwiftShader safety probe into executeWorkerTask so each parallel worker
validates its own GPU backend before its first frame.

Why

The OSS comment at the probeSession: session callsite already flagged
this:

// The parallel branch closes this session and spins up its own
// worker sessions, wasting the ~3-5s of pre-warmed setup. Worth a
// follow-up to skip pre-warmup when the resolved workerCount > 1.
probeSession: session,

In the dev distributed-render benchmark (12 producer-worker pods, v0.6.15
sidecar), texture-launch's chunk_p95 lands at ~25-35s with most of the
non-capture time being per-chunk fixed overhead. ~3-5s of that is the
probe pipeline we throw away — meaningful at the small chunks that real
maxParallelChunks settings produce.

How

packages/engine/src/utils/readWebGlVendorInfoFromCanvas.ts — new file.
Moved from producer/services/distributed/renderChunk.ts; both
producer and engine now need it. renderChunk.ts keeps a re-export
via export { readWebGlVendorInfoFromCanvas } from "@hyperframes/engine";
so @hyperframes/producer/distributed's public surface is unchanged
(the existing publicExports.test.ts assertion still passes).
packages/engine/src/services/parallelCoordinator.ts —
executeWorkerTask now runs
assertSwiftShader(session.page, readWebGlVendorInfoFromCanvas)
after createCaptureSession when config.browserGpuMode === "software".
Each parallel worker self-validates SwiftShader before its first
frame.

In-process renders default to browserGpuMode: "software", so they
also pick up this safety net. Cost is one about:blank navigation +
one page.evaluate per worker (~100-200ms, concurrent across workers
so wall-clock impact ≈ slowest worker probe). Hardware-GL paths
(browserGpuMode !== "software") are untouched.
packages/producer/src/services/distributed/renderChunk.ts —
resolve chunkWorkerCount up-front, skip the entire pre-warmup
branch when chunkWorkerCount > 1, pass probeSession: null in
that case. Sequential path (chunkWorkerCount === 1) is unchanged:
it still pre-warms because captureStage's sequential branch reuses
the probe.

Test plan

Unit tests added/updated — parallelCoordinator.test.ts's
existing distributeFrames/calculateOptimalWorkers coverage is
unaffected; behavior gating is small enough to verify by reading.
Existing tests pass:
- packages/engine — all 605 tests pass (bun run test).
- packages/producer distributed unit tests — 47 pass
  (assemble, plan, publicExports, chunkBoundary,
  planFormatBanlist, planSizeCap).
Lint + format — bunx oxlint and bunx oxfmt --check clean
across the four touched files.
Docker-based regression renderChunk.test.ts byte-identical-retry
test — runs the fixture at chunkWorkerCount=1 (5 frames), exercising
the sequential path which is unchanged. The parallel path will be
exercised by re-benchmarking on dev after release.
Re-benchmark texture-launch on dev after v0.6.16 release. Expect
chunk_p95 to drop ~3-5s at chunks ∈ {3, 6, 8} (where
chunkWorkerCount > 1 and the probe is currently wasted). chunks=12
hits chunkWorkerCount=1 and is unchanged.

Notes for reviewers

The flagged // follow-up comment in
services/distributed/renderChunk.ts is removed.
readWebGlVendorInfoFromCanvas is unchanged in behavior — just
relocated.
I considered a per-task assertSoftwareGl?: boolean flag instead of
gating on cfg.browserGpuMode === "software", but the latter matches
the existing safety contract semantics cleanly: "if you declared
software GL, we verify it." Open to flipping if preferred.

`renderChunk` was pre-warming a `createCaptureSession + assertSwiftShader + initializeSession` pipeline before every chunk render. When the resolved `chunkWorkerCount > 1`, the parallel branch in `captureStage` immediately closes that probe and spins up fresh per-worker sessions — wasting the ~3-5s of pre-warmed setup. (The OSS comment at `runCaptureStage(... probeSession: session ...)` flagged this as a follow-up.) Move the SwiftShader assertion into `executeWorkerTask` so each parallel worker validates its own GPU backend against `chrome://gpu`-style canvas probe (canvas + WEBGL_debug_renderer_info works on both regular Chrome and `chrome-headless-shell`). Gated on `cfg.browserGpuMode === "software"` so in-process renders that opt into software GL also pick up the safety net, while hardware-GL paths are untouched. In `renderChunk`, compute `chunkWorkerCount` up-front and skip the entire pre-warmup (createCaptureSession + assert + initialize) when > 1 — the parallel workers cover it. Sequential path (chunkWorkerCount === 1) is unchanged: it still pre-warms because `captureStage`'s sequential branch reuses the probe. Move `readWebGlVendorInfoFromCanvas` from `packages/producer/src/services/distributed/renderChunk.ts` to `packages/engine/src/utils/readWebGlVendorInfoFromCanvas.ts` (both producer and engine need it now). `renderChunk.ts` re-exports the function from `@hyperframes/engine` so downstream consumers that import it from `@hyperframes/producer/distributed` keep working (the `publicExports.test.ts` assertion is preserved). Expected impact on the texture-launch fixture (dev, 12 producer-worker pods, v0.6.15 sidecar; baseline from re-run sweep with `--chunk-size 10`): chunks=3 chunkWorkerCount=6 → ~3-5s/chunk saved (~5-10% wall) chunks=6 chunkWorkerCount=3 → ~3-5s/chunk saved (~7-12% wall) chunks=8 chunkWorkerCount=2 → ~3-5s/chunk saved (~10-13% wall) chunks=12 chunkWorkerCount=1 → no change (sequential path reuses probe)

jrusso1020 merged commit f01fccb into main May 17, 2026
40 checks passed

jrusso1020 deleted the perf-skip-probe-when-parallel branch May 17, 2026 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(distributed): skip eager probe session when chunkWorkerCount > 1#916

perf(distributed): skip eager probe session when chunkWorkerCount > 1#916
jrusso1020 merged 1 commit into
mainfrom
perf-skip-probe-when-parallel

jrusso1020 commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrusso1020 commented May 17, 2026

What

Why

How

Test plan

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant