Skip to content

TextDecoder: accept all WHATWG utf-8 encoding labels#198

Open
bkaradzic-microsoft wants to merge 1 commit into
BabylonJS:mainfrom
bkaradzic-microsoft:fix/textdecoder-whatwg-labels
Open

TextDecoder: accept all WHATWG utf-8 encoding labels#198
bkaradzic-microsoft wants to merge 1 commit into
BabylonJS:mainfrom
bkaradzic-microsoft:fix/textdecoder-whatwg-labels

Conversation

@bkaradzic-microsoft

Copy link
Copy Markdown
Member

What

TextDecoder's constructor only accepted the exact labels "utf-8" and "UTF-8", throwing for every other spelling. This PR makes it accept all WHATWG-spec UTF-8 labels.

Why

Per the WHATWG Encoding Standard, an encoding label is matched after stripping leading/trailing ASCII whitespace and ASCII-lowercasing, and several labels all decode as UTF-8: utf-8, utf8, unicode-1-1-utf-8, unicode11utf8, unicode20utf8, x-unicode20utf8.

Real consumers rely on this. The Babylon.js glTF/Draco loader constructs new TextDecoder("utf8") (no hyphen). With the old check that threw, decoding aborted mid-load. In Babylon Native the aborted load left the loader in a state that drove a native out-of-bounds write, which surfaced as non-deterministic heap corruption (STATUS_HEAP_CORRUPTION, 0xC0000374) on the Draco mesh-compression validation tests.

Fix

Normalize the label per the spec (trim ASCII whitespace + ASCII-lowercase) and accept the full set of UTF-8 labels; still throw for genuinely unsupported encodings.

Verification

  • Two BabylonNative Playground Draco validation tests that previously crashed with heap corruption (GLTF Serializer KHR draco mesh compression, GLTF Buggy with Draco Mesh Compression) now pass (3/3 runs each) with this fix vendored in; a third (GLTF Box with bad Draco normalized flag) no longer crashes.
  • Adds unit tests covering the utf8 label, case/whitespace variants, the other UTF-8 aliases, and a still-rejected encoding (utf-16).

The TextDecoder constructor only accepted the exact labels "utf-8"/"UTF-8"
and threw for every other spelling. Per the WHATWG Encoding Standard, an
encoding label is matched after stripping leading/trailing ASCII whitespace
and ASCII-lowercasing, and several labels ("utf8", "unicode-1-1-utf-8",
"unicode11utf8", "unicode20utf8", "x-unicode20utf8") all map to UTF-8.

Consumers such as the Babylon.js glTF/Draco loader construct
`new TextDecoder("utf8")`; the throw aborted decoding mid-load and (in
Babylon Native) left the loader in a state that drove a native out-of-bounds
write, observed as non-deterministic heap corruption on the Draco
validation tests.

Normalize the label per spec and accept all UTF-8 labels. Adds regression
tests for "utf8", case/whitespace variants, the other aliases, and a
still-rejected unsupported encoding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 13, 2026 03:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the TextDecoder polyfill to accept all UTF-8 encoding labels recognized by the WHATWG Encoding Standard (e.g., utf8, unicode11utf8), fixing real-world incompatibilities (notably Babylon.js glTF/Draco loader usage) and adds unit tests to prevent regressions.

Changes:

  • Normalize the constructor’s encoding label (ASCII trim + ASCII lowercase) and accept all WHATWG UTF-8 labels.
  • Preserve rejection behavior for unsupported encodings (e.g., utf-16).
  • Add unit tests covering accepted aliases and normalization behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Tests/UnitTests/Scripts/tests.ts Adds unit tests for UTF-8 label aliases, case/whitespace normalization, and unsupported encodings.
Polyfills/TextDecoder/Source/TextDecoder.cpp Implements WHATWG-style label normalization and accepts the full set of UTF-8 labels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

label != "unicode20utf8" &&
label != "x-unicode20utf8")
{
throw Napi::Error::New(Env(), "TextDecoder: unsupported encoding '" + encoding + "', only 'utf-8' is supported");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants