chore(engine): observability for sub-composition timeline poll by jrusso1020 · Pull Request #945 · heygen-com/hyperframes

jrusso1020 · 2026-05-18T21:26:17Z

What

Adds a single structured log line per render in pollSubCompositionTimelines (packages/engine/src/services/frameCapture.ts) capturing:

hostIds — every [data-composition-id] host on the page
timelineIdsBefore / timelineIdsAfter — window.__timelines keys around the poll
addedDuringPoll — the diff (which timelines registered while we waited)
pollMs — wall-clock time spent polling
ready — whether all host timelines were present at the deadline
rebindFired — whether __hfForceTimelineRebind() was invoked

Why

Regression CI has been flaky on main and PRs since the 2026-05-18 renderer stack for async-loaded map blocks (#928 → main). Different fixtures fail on different runs:

main 27efcd0f (v0.6.22): shard-3 style-7-prod, 60 failed frames
main 8163f380 (v0.6.21): shard-3 style-7-prod, 60 failed frames
PR fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame #926: shard-8 gsap-letters-render-compat, 86 failed frames
fallow PR: shard-1 style-3-prod, 7 failed frames

PSNR drops are 18–30 dB across animation-heavy intervals — the fingerprint of render time shifted ~1-2 frames vs. baseline. Hypothesis: the count-based rebind heuristic in pollSubCompositionTimelines (introduced 2c84c9a, amended 4bf2fa5) is racy. On slow CI runners, sub-composition scripts register their timelines between the before-snapshot and the end of the poll → rebind fires → seek timing shifts → PSNR drops. On fast runners, the same fixture registers everything synchronously → no rebind → baseline matches.

This PR doesn't fix the race — it adds the observability needed to confirm the hypothesis before reverting/redesigning. Once correlation is established between failing runs and rebindFired=true, this PR is reverted and a real fix follows.

How

Replaces the timelinesBeforePoll/timelinesAfterPoll count comparison with a set-difference computed from explicit ID lists. Behavior is unchanged: addedDuringPoll.length > 0 is mathematically equivalent to after > before under the invariant that window.__timelines keys are never removed mid-poll.

The log line is unconditional — we want it on every CI shard, regardless of pass/fail.

Test plan

Trigger regression workflow (auto, on PR open) and grep CI logs for pollSubCompositionTimelines JSON lines
Re-run regression 3-4 times via empty commits or shard rerun
Verify correlation: failing shards' fixtures should show rebindFired:true and a non-empty addedDuringPoll; passing shards should show rebindFired:false
No baseline regen needed — render output is unchanged

This PR is intentionally landing as draft and is intended to be reverted once the investigation completes.

Captures host IDs, timeline IDs before/after the poll, the diff set, and whether __hfForceTimelineRebind fired — emitted as a single JSON log line per render. Lets us correlate flaky regression runs (style-7-prod, gsap-letters-render-compat, style-3-prod) with whether the count-based rebind heuristic fired. Behavior is unchanged: the rebind condition `addedDuringPoll.length > 0` is equivalent to the prior `timelinesAfterPoll > timelinesBeforePoll` under the invariant that timeline IDs are never removed mid-poll. Intended to be reverted once the race condition is confirmed and patched.

Lets every shard run to completion and emit its pollSubCompositionTimelines JSON log line, so we can correlate rebindFired across all shards on a single run instead of only seeing data from whichever shard happened to fail first. Restore fail-fast: true before merging or reverting this PR.

jrusso1020 added 2 commits May 18, 2026 21:25

jrusso1020 mentioned this pull request May 19, 2026

test(producer): regenerate 7 stale regression baselines #946

Merged

4 tasks

jrusso1020 closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(engine): observability for sub-composition timeline poll#945

chore(engine): observability for sub-composition timeline poll#945
jrusso1020 wants to merge 2 commits into
mainfrom
chore/regression-poll-observability

jrusso1020 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrusso1020 commented May 18, 2026

What

Why

How

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant