Harden Playground tests against transient network errors#1690
Merged
Conversation
The Win32_x64_D3D11 Playground job fails intermittently with five identical "[Error] SyntaxError: JSON.parse Error: Unexpected input at position:0" lines followed by "[Log] Running the playground failed." on commits that don't touch playground code. Root cause: validation_native.js fetches each test's snippet from https://snippet.babylonjs.com/<id>/<rev> and unconditionally calls JSON.parse(xmlHttp.responseText) on readyState === 4, ignoring xmlHttp.status. When the snippet service returns a transient error (5xx, 429, gateway timeout, empty body), the parse fails and falls through to the catch which calls onError. The retry policy is maxRetry=5 with a fixed 500ms delay -- a 2-second total budget that cannot ride out a normal CDN/upstream blip. Three changes: 1. Check xmlHttp.status === 200 before parsing. Non-200 responses are logged with the status code and the playground id, then routed to onError instead of bubbling up as a misleading SyntaxError. 2. Increase maxRetry from 5 to 8. 3. Replace the fixed 500ms delay with exponential backoff capped at 30 seconds (500ms, 1s, 2s, 4s, 8s, 16s, 30s). Total budget grows from ~2s to ~60s, which is sufficient to ride out typical service blips without changing the eventual fail-fast behavior on persistent outages. Validates against the canonical snippet loader in BabylonJS/Babylon.js (packages/tools/snippetLoader/src/fetchSnippet.ts) which also checks response.ok before calling response.json(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves the robustness of BabylonNative’s Playground visual validation runner by hardening the snippet-fetch path against transient HTTP failures from snippet.babylonjs.com, reducing flaky CI failures unrelated to code changes.
Changes:
- Add an HTTP status check before attempting to parse the snippet response as JSON.
- Increase retry attempts (5 → 8) and implement exponential backoff (capped at 30s).
- Improve the final failure log message to reflect the number of attempts.
The readystatechange listener is registered via addEventListener, not via the onreadystatechange property -- those are separate slots in the XHR API. Setting the property to null does not detach the listener, just as the reviewer observed. Spec also guarantees readystatechange fires only once on transition to DONE, so the line was a misleading no-op. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ryantrem
approved these changes
May 8, 2026
…etry # Conflicts: # Apps/Playground/Scripts/validation_native.js
Replaces the manual XMLHttpRequest + retry loop in `loadPG()` with a
single `BABYLON.Tools.LoadFile()` call so that snippet fetches and
in-snippet texture / scene fetches all share the same Babylon-provided
retry plumbing. The custom retry block (8 attempts, exponential backoff,
explicit status-200 gate) is removed because `FileTools.LoadFile`'s
internal `retryLoop` already implements retries via the global
`FileToolsOptions.DefaultRetryStrategy`, and that strategy is now
configured up front for the test framework.
The configured strategy broadens the upstream `ExponentialBackoff`
default (which retries only when `request.status === 0`) to also cover
transient HTTP error responses:
- `0` — network drop / connection reset (existing)
- `429` — rate limited
- `5xx` — server error / gateway timeout / etc.
Up to 5 attempts with `500ms * 2^N` backoff. Applies to every
`Tools.LoadFile` call in the test framework, including:
- The snippet fetch in `loadPG()` (this PR's primary concern, same
scenario as before).
- The reference-image fetch in `runTest()` (already on `LoadFile`).
- Every texture / scene / asset URL loaded from inside each
playground's `createScene()` body via `_loadFile` (e.g.
`new BABYLON.Texture("...exr", scene)`).
That last category covers the `EXR Loader` flake observed on
`Win32_x64_V8_D3D11` (run 25930475253): a single transient failure
fetching `green-door.exr` from the assets CDN caused Babylon's fallback
red-and-black checkerboard texture to be substituted, which validated
as a ~110k pixel diff against the reference. Under the new strategy the
fetch will retry on 5xx/429 instead of falling back on the first error.
Net `validation_native.js` change: +57 / -85 lines (-28 net) plus the
21-line strategy override block at file scope. The strategy override
is a one-time global mutation in the test-framework entry; it does not
affect any non-test BabylonNative app.
[Created by Copilot on behalf of @bghgary]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Two transient-CDN failures observed in CI:
Snippet fetch:
Win32_x64_D3D11 / buildintermittently fails with 5×SyntaxError: JSON.parse Error: Unexpected input at position:0, when the snippet service returns 5xx/429/empty body andloadPG()'s hand-rolled XHR parses the response without checkingxmlHttp.status.Asset fetch:
EXR Loadervisualization test flaked once onWin32_x64_V8_D3D11(run 25930475253) with a ~110k-pixel diff. The artifact is Babylon'sEngineStore.FallbackTexturecheckerboard, substituted becausegreen-door.exrfetch returned an error. Babylon'sFileToolsOptions.DefaultRetryStrategyonly retries onrequest.status === 0, so HTTP 5xx/429 → no retry → fallback.Approach
Two changes in
Apps/Playground/Scripts/validation_native.js:Replace the hand-rolled XHR + retry in
loadPG()withBABYLON.Tools.LoadFile(). Reuses the upstreamretryLoopinFileTools.LoadFile; no more custom retry state.Broaden
BABYLON.Tools.DefaultRetryStrategyfor the test framework to also retry429and5xx, not juststatus === 0. Up to 5 attempts,500 * 2^Nbackoff. One-time global mutation; doesn't affect non-test BabylonNative apps.The single strategy override covers snippet fetches (
loadPG), reference-image fetches (runTest), and every asset URL loaded from inside a playground'screateScene()body — closing the EXR Loader flake surface without per-call changes.[Created by Copilot on behalf of @bghgary]