perf(distributed): skip eager probe session when chunkWorkerCount > 1#916
Merged
Conversation
`renderChunk` was pre-warming a `createCaptureSession + assertSwiftShader + initializeSession` pipeline before every chunk render. When the resolved `chunkWorkerCount > 1`, the parallel branch in `captureStage` immediately closes that probe and spins up fresh per-worker sessions — wasting the ~3-5s of pre-warmed setup. (The OSS comment at `runCaptureStage(... probeSession: session ...)` flagged this as a follow-up.) Move the SwiftShader assertion into `executeWorkerTask` so each parallel worker validates its own GPU backend against `chrome://gpu`-style canvas probe (canvas + WEBGL_debug_renderer_info works on both regular Chrome and `chrome-headless-shell`). Gated on `cfg.browserGpuMode === "software"` so in-process renders that opt into software GL also pick up the safety net, while hardware-GL paths are untouched. In `renderChunk`, compute `chunkWorkerCount` up-front and skip the entire pre-warmup (createCaptureSession + assert + initialize) when > 1 — the parallel workers cover it. Sequential path (chunkWorkerCount === 1) is unchanged: it still pre-warms because `captureStage`'s sequential branch reuses the probe. Move `readWebGlVendorInfoFromCanvas` from `packages/producer/src/services/distributed/renderChunk.ts` to `packages/engine/src/utils/readWebGlVendorInfoFromCanvas.ts` (both producer and engine need it now). `renderChunk.ts` re-exports the function from `@hyperframes/engine` so downstream consumers that import it from `@hyperframes/producer/distributed` keep working (the `publicExports.test.ts` assertion is preserved). Expected impact on the texture-launch fixture (dev, 12 producer-worker pods, v0.6.15 sidecar; baseline from re-run sweep with `--chunk-size 10`): chunks=3 chunkWorkerCount=6 → ~3-5s/chunk saved (~5-10% wall) chunks=6 chunkWorkerCount=3 → ~3-5s/chunk saved (~7-12% wall) chunks=8 chunkWorkerCount=2 → ~3-5s/chunk saved (~10-13% wall) chunks=12 chunkWorkerCount=1 → no change (sequential path reuses probe)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When
renderChunkresolveschunkWorkerCount > 1, skip the eagercreateCaptureSession + assertSwiftShader + initializeSessionpre-warmupthat
captureStage's parallel branch immediately closes anyway. Move theSwiftShader safety probe into
executeWorkerTaskso each parallel workervalidates its own GPU backend before its first frame.
Why
The OSS comment at the
probeSession: sessioncallsite already flaggedthis:
In the dev distributed-render benchmark (12 producer-worker pods, v0.6.15
sidecar), texture-launch's
chunk_p95lands at ~25-35s with most of thenon-capture time being per-chunk fixed overhead. ~3-5s of that is the
probe pipeline we throw away — meaningful at the small chunks that real
maxParallelChunks settings produce.
How
packages/engine/src/utils/readWebGlVendorInfoFromCanvas.ts— new file.Moved from
producer/services/distributed/renderChunk.ts; bothproducer and engine now need it.
renderChunk.tskeeps a re-exportvia
export { readWebGlVendorInfoFromCanvas } from "@hyperframes/engine";so
@hyperframes/producer/distributed's public surface is unchanged(the existing
publicExports.test.tsassertion still passes).packages/engine/src/services/parallelCoordinator.ts—executeWorkerTasknow runsassertSwiftShader(session.page, readWebGlVendorInfoFromCanvas)after
createCaptureSessionwhenconfig.browserGpuMode === "software".Each parallel worker self-validates SwiftShader before its first
frame.
In-process renders default to
browserGpuMode: "software", so theyalso pick up this safety net. Cost is one
about:blanknavigation +one
page.evaluateper worker (~100-200ms, concurrent across workersso wall-clock impact ≈ slowest worker probe). Hardware-GL paths
(
browserGpuMode !== "software") are untouched.packages/producer/src/services/distributed/renderChunk.ts—resolve
chunkWorkerCountup-front, skip the entire pre-warmupbranch when
chunkWorkerCount > 1, passprobeSession: nullinthat case. Sequential path (
chunkWorkerCount === 1) is unchanged:it still pre-warms because
captureStage's sequential branch reusesthe probe.
Test plan
parallelCoordinator.test.ts'sexisting
distributeFrames/calculateOptimalWorkerscoverage isunaffected; behavior gating is small enough to verify by reading.
packages/engine— all 605 tests pass (bun run test).packages/producerdistributed unit tests — 47 pass(
assemble,plan,publicExports,chunkBoundary,planFormatBanlist,planSizeCap).bunx oxlintandbunx oxfmt --checkcleanacross the four touched files.
renderChunk.test.tsbyte-identical-retrytest — runs the fixture at
chunkWorkerCount=1(5 frames), exercisingthe sequential path which is unchanged. The parallel path will be
exercised by re-benchmarking on dev after release.
texture-launchon dev after v0.6.16 release. Expectchunk_p95to drop ~3-5s atchunks ∈ {3, 6, 8}(wherechunkWorkerCount > 1and the probe is currently wasted).chunks=12hits
chunkWorkerCount=1and is unchanged.Notes for reviewers
// follow-upcomment inservices/distributed/renderChunk.tsis removed.readWebGlVendorInfoFromCanvasis unchanged in behavior — justrelocated.
assertSoftwareGl?: booleanflag instead ofgating on
cfg.browserGpuMode === "software", but the latter matchesthe existing safety contract semantics cleanly: "if you declared
software GL, we verify it." Open to flipping if preferred.