TextDecoder: accept all WHATWG utf-8 encoding labels by bkaradzic-microsoft · Pull Request #198 · BabylonJS/JsRuntimeHost

bkaradzic-microsoft · 2026-06-13T03:08:40Z

What

TextDecoder's constructor only accepted the exact labels "utf-8" and "UTF-8", throwing for every other spelling. This PR makes it accept all WHATWG-spec UTF-8 labels.

Why

Per the WHATWG Encoding Standard, an encoding label is matched after stripping leading/trailing ASCII whitespace and ASCII-lowercasing, and several labels all decode as UTF-8: utf-8, utf8, unicode-1-1-utf-8, unicode11utf8, unicode20utf8, x-unicode20utf8.

Real consumers rely on this. The Babylon.js glTF/Draco loader constructs new TextDecoder("utf8") (no hyphen). With the old check that threw, decoding aborted mid-load. In Babylon Native the aborted load left the loader in a state that drove a native out-of-bounds write, which surfaced as non-deterministic heap corruption (STATUS_HEAP_CORRUPTION, 0xC0000374) on the Draco mesh-compression validation tests.

Fix

Normalize the label per the spec (trim ASCII whitespace + ASCII-lowercase) and accept the full set of UTF-8 labels; still throw for genuinely unsupported encodings.

Verification

Two BabylonNative Playground Draco validation tests that previously crashed with heap corruption (GLTF Serializer KHR draco mesh compression, GLTF Buggy with Draco Mesh Compression) now pass (3/3 runs each) with this fix vendored in; a third (GLTF Box with bad Draco normalized flag) no longer crashes.
Adds unit tests covering the utf8 label, case/whitespace variants, the other UTF-8 aliases, and a still-rejected encoding (utf-16).

The TextDecoder constructor only accepted the exact labels "utf-8"/"UTF-8" and threw for every other spelling. Per the WHATWG Encoding Standard, an encoding label is matched after stripping leading/trailing ASCII whitespace and ASCII-lowercasing, and several labels ("utf8", "unicode-1-1-utf-8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8") all map to UTF-8. Consumers such as the Babylon.js glTF/Draco loader construct `new TextDecoder("utf8")`; the throw aborted decoding mid-load and (in Babylon Native) left the loader in a state that drove a native out-of-bounds write, observed as non-deterministic heap corruption on the Draco validation tests. Normalize the label per spec and accept all UTF-8 labels. Adds regression tests for "utf8", case/whitespace variants, the other aliases, and a still-rejected unsupported encoding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates the TextDecoder polyfill to accept all UTF-8 encoding labels recognized by the WHATWG Encoding Standard (e.g., utf8, unicode11utf8), fixing real-world incompatibilities (notably Babylon.js glTF/Draco loader usage) and adds unit tests to prevent regressions.

Changes:

Normalize the constructor’s encoding label (ASCII trim + ASCII lowercase) and accept all WHATWG UTF-8 labels.
Preserve rejection behavior for unsupported encodings (e.g., utf-16).
Add unit tests covering accepted aliases and normalization behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
Tests/UnitTests/Scripts/tests.ts	Adds unit tests for UTF-8 label aliases, case/whitespace normalization, and unsupported encodings.
Polyfills/TextDecoder/Source/TextDecoder.cpp	Implements WHATWG-style label normalization and accepts the full set of UTF-8 labels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                    label != "unicode20utf8" &&
+                    label != "x-unicode20utf8")
                {
                    throw Napi::Error::New(Env(), "TextDecoder: unsupported encoding '" + encoding + "', only 'utf-8' is supported");


Copilot AI review requested due to automatic review settings June 13, 2026 03:08

Copilot started reviewing on behalf of bkaradzic-microsoft June 13, 2026 03:10 View session

Copilot AI reviewed Jun 13, 2026

View reviewed changes

Comment thread Polyfills/TextDecoder/Source/TextDecoder.cpp

label != "unicode20utf8" &&

label != "x-unicode20utf8")

{

throw Napi::Error::New(Env(), "TextDecoder: unsupported encoding '" + encoding + "', only 'utf-8' is supported");

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextDecoder: accept all WHATWG utf-8 encoding labels#198

TextDecoder: accept all WHATWG utf-8 encoding labels#198
bkaradzic-microsoft wants to merge 1 commit into
BabylonJS:mainfrom
bkaradzic-microsoft:fix/textdecoder-whatwg-labels

bkaradzic-microsoft commented Jun 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bkaradzic-microsoft commented Jun 13, 2026

What

Why

Fix

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants