Skip to content

Harden Playground tests against transient network errors#1690

Merged
bghgary merged 4 commits into
BabylonJS:masterfrom
bghgary:harden-playground-retry
May 15, 2026
Merged

Harden Playground tests against transient network errors#1690
bghgary merged 4 commits into
BabylonJS:masterfrom
bghgary:harden-playground-retry

Conversation

@bghgary
Copy link
Copy Markdown
Contributor

@bghgary bghgary commented May 7, 2026

Context

Two transient-CDN failures observed in CI:

  1. Snippet fetch: Win32_x64_D3D11 / build intermittently fails with 5× SyntaxError: JSON.parse Error: Unexpected input at position:0, when the snippet service returns 5xx/429/empty body and loadPG()'s hand-rolled XHR parses the response without checking xmlHttp.status.

  2. Asset fetch: EXR Loader visualization test flaked once on Win32_x64_V8_D3D11 (run 25930475253) with a ~110k-pixel diff. The artifact is Babylon's EngineStore.FallbackTexture checkerboard, substituted because green-door.exr fetch returned an error. Babylon's FileToolsOptions.DefaultRetryStrategy only retries on request.status === 0, so HTTP 5xx/429 → no retry → fallback.

Approach

Two changes in Apps/Playground/Scripts/validation_native.js:

  1. Replace the hand-rolled XHR + retry in loadPG() with BABYLON.Tools.LoadFile(). Reuses the upstream retryLoop in FileTools.LoadFile; no more custom retry state.

  2. Broaden BABYLON.Tools.DefaultRetryStrategy for the test framework to also retry 429 and 5xx, not just status === 0. Up to 5 attempts, 500 * 2^N backoff. One-time global mutation; doesn't affect non-test BabylonNative apps.

The single strategy override covers snippet fetches (loadPG), reference-image fetches (runTest), and every asset URL loaded from inside a playground's createScene() body — closing the EXR Loader flake surface without per-call changes.

[Created by Copilot on behalf of @bghgary]

The Win32_x64_D3D11 Playground job fails intermittently with five
identical "[Error] SyntaxError: JSON.parse Error: Unexpected input at
position:0" lines followed by "[Log] Running the playground failed."
on commits that don't touch playground code.

Root cause: validation_native.js fetches each test's snippet from
https://snippet.babylonjs.com/<id>/<rev> and unconditionally calls
JSON.parse(xmlHttp.responseText) on readyState === 4, ignoring
xmlHttp.status. When the snippet service returns a transient error
(5xx, 429, gateway timeout, empty body), the parse fails and falls
through to the catch which calls onError. The retry policy is
maxRetry=5 with a fixed 500ms delay -- a 2-second total budget that
cannot ride out a normal CDN/upstream blip.

Three changes:

1. Check xmlHttp.status === 200 before parsing. Non-200 responses are
   logged with the status code and the playground id, then routed to
   onError instead of bubbling up as a misleading SyntaxError.
2. Increase maxRetry from 5 to 8.
3. Replace the fixed 500ms delay with exponential backoff capped at
   30 seconds (500ms, 1s, 2s, 4s, 8s, 16s, 30s). Total budget grows
   from ~2s to ~60s, which is sufficient to ride out typical service
   blips without changing the eventual fail-fast behavior on persistent
   outages.

Validates against the canonical snippet loader in BabylonJS/Babylon.js
(packages/tools/snippetLoader/src/fetchSnippet.ts) which also checks
response.ok before calling response.json().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 22:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the robustness of BabylonNative’s Playground visual validation runner by hardening the snippet-fetch path against transient HTTP failures from snippet.babylonjs.com, reducing flaky CI failures unrelated to code changes.

Changes:

  • Add an HTTP status check before attempting to parse the snippet response as JSON.
  • Increase retry attempts (5 → 8) and implement exponential backoff (capped at 30s).
  • Improve the final failure log message to reflect the number of attempts.

Comment thread Apps/Playground/Scripts/validation_native.js Outdated
The readystatechange listener is registered via addEventListener, not via the onreadystatechange property -- those are separate slots in the XHR API. Setting the property to null does not detach the listener, just as the reviewer observed. Spec also guarantees readystatechange fires only once on transition to DONE, so the line was a misleading no-op.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary and others added 2 commits May 15, 2026 13:13
…etry

# Conflicts:
#	Apps/Playground/Scripts/validation_native.js
Replaces the manual XMLHttpRequest + retry loop in `loadPG()` with a
single `BABYLON.Tools.LoadFile()` call so that snippet fetches and
in-snippet texture / scene fetches all share the same Babylon-provided
retry plumbing. The custom retry block (8 attempts, exponential backoff,
explicit status-200 gate) is removed because `FileTools.LoadFile`'s
internal `retryLoop` already implements retries via the global
`FileToolsOptions.DefaultRetryStrategy`, and that strategy is now
configured up front for the test framework.

The configured strategy broadens the upstream `ExponentialBackoff`
default (which retries only when `request.status === 0`) to also cover
transient HTTP error responses:

  - `0`              — network drop / connection reset (existing)
  - `429`            — rate limited
  - `5xx`            — server error / gateway timeout / etc.

Up to 5 attempts with `500ms * 2^N` backoff. Applies to every
`Tools.LoadFile` call in the test framework, including:

  - The snippet fetch in `loadPG()` (this PR's primary concern, same
    scenario as before).
  - The reference-image fetch in `runTest()` (already on `LoadFile`).
  - Every texture / scene / asset URL loaded from inside each
    playground's `createScene()` body via `_loadFile` (e.g.
    `new BABYLON.Texture("...exr", scene)`).

That last category covers the `EXR Loader` flake observed on
`Win32_x64_V8_D3D11` (run 25930475253): a single transient failure
fetching `green-door.exr` from the assets CDN caused Babylon's fallback
red-and-black checkerboard texture to be substituted, which validated
as a ~110k pixel diff against the reference. Under the new strategy the
fetch will retry on 5xx/429 instead of falling back on the first error.

Net `validation_native.js` change: +57 / -85 lines (-28 net) plus the
21-line strategy override block at file scope. The strategy override
is a one-time global mutation in the test-framework entry; it does not
affect any non-test BabylonNative app.

[Created by Copilot on behalf of @bghgary]

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bghgary bghgary changed the title Harden Playground snippet-fetch retry against transient failures Harden Playground tests against transient network errors May 15, 2026
@bghgary bghgary requested a review from ryantrem May 15, 2026 21:57
@bghgary bghgary merged commit 5c4081e into BabylonJS:master May 15, 2026
28 checks passed
@bghgary bghgary deleted the harden-playground-retry branch May 15, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants