chore(engine): observability for sub-composition timeline poll#945
Closed
jrusso1020 wants to merge 2 commits into
Closed
chore(engine): observability for sub-composition timeline poll#945jrusso1020 wants to merge 2 commits into
jrusso1020 wants to merge 2 commits into
Conversation
Captures host IDs, timeline IDs before/after the poll, the diff set, and whether __hfForceTimelineRebind fired — emitted as a single JSON log line per render. Lets us correlate flaky regression runs (style-7-prod, gsap-letters-render-compat, style-3-prod) with whether the count-based rebind heuristic fired. Behavior is unchanged: the rebind condition `addedDuringPoll.length > 0` is equivalent to the prior `timelinesAfterPoll > timelinesBeforePoll` under the invariant that timeline IDs are never removed mid-poll. Intended to be reverted once the race condition is confirmed and patched.
Lets every shard run to completion and emit its pollSubCompositionTimelines JSON log line, so we can correlate rebindFired across all shards on a single run instead of only seeing data from whichever shard happened to fail first. Restore fail-fast: true before merging or reverting this PR.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a single structured log line per render in
pollSubCompositionTimelines(packages/engine/src/services/frameCapture.ts) capturing:hostIds— every[data-composition-id]host on the pagetimelineIdsBefore/timelineIdsAfter—window.__timelineskeys around the polladdedDuringPoll— the diff (which timelines registered while we waited)pollMs— wall-clock time spent pollingready— whether all host timelines were present at the deadlinerebindFired— whether__hfForceTimelineRebind()was invokedWhy
Regression CI has been flaky on
mainand PRs since the 2026-05-18 renderer stack for async-loaded map blocks (#928 → main). Different fixtures fail on different runs:27efcd0f(v0.6.22): shard-3style-7-prod, 60 failed frames8163f380(v0.6.21): shard-3style-7-prod, 60 failed framesgsap-letters-render-compat, 86 failed framesstyle-3-prod, 7 failed framesPSNR drops are 18–30 dB across animation-heavy intervals — the fingerprint of render time shifted ~1-2 frames vs. baseline. Hypothesis: the count-based rebind heuristic in
pollSubCompositionTimelines(introduced 2c84c9a, amended 4bf2fa5) is racy. On slow CI runners, sub-composition scripts register their timelines between the before-snapshot and the end of the poll → rebind fires → seek timing shifts → PSNR drops. On fast runners, the same fixture registers everything synchronously → no rebind → baseline matches.This PR doesn't fix the race — it adds the observability needed to confirm the hypothesis before reverting/redesigning. Once correlation is established between failing runs and
rebindFired=true, this PR is reverted and a real fix follows.How
Replaces the
timelinesBeforePoll/timelinesAfterPollcount comparison with a set-difference computed from explicit ID lists. Behavior is unchanged:addedDuringPoll.length > 0is mathematically equivalent toafter > beforeunder the invariant thatwindow.__timelineskeys are never removed mid-poll.The log line is unconditional — we want it on every CI shard, regardless of pass/fail.
Test plan
pollSubCompositionTimelinesJSON linesrebindFired:trueand a non-emptyaddedDuringPoll; passing shards should showrebindFired:falseThis PR is intentionally landing as draft and is intended to be reverted once the investigation completes.