Summary
When a CSL locale fails to load, quarto-citeproc silently falls back to English — English terms (and, et al.), English curly quotes, and English date formats — and emits no warning or error. A document that requests French, German, etc. citations can render with English citation furniture and the user is never told anything went wrong.
This is a correctness + observability bug: the rendered output is wrong, and there is no signal.
Where it happens
crates/quarto-citeproc/src/locale.rs:
load_embedded_locale (lines ~240–266) swallows every failure mode. File-not-found in the embed, non-UTF-8 bytes, and XML parse errors all collapse to None:
if let Some(file) = LocaleFiles::get(&filename) {
let xml = std::str::from_utf8(file.data.as_ref()).ok()?; // parse/utf8 error -> None
return parse_locale_xml(xml).ok(); // parse error -> None
}
// ...
None // not found -> None
load_locale (lines ~41–62) tries the exact tag, then the base language, then returns None — silently leaving the manager's locales map empty.
- The consumers substitute hardcoded English, again silently:
get_quote_config (lines ~165–185): .unwrap_or_else(|| \"\u{201C}\".to_string()) (English curly quotes), etc.
get_term (lines ~79–86): explicit Fall back to en-US.
Why this is more than a missing-locale convenience
The locale files are compiled into the binary via rust-embed (#[derive(Embed)] #[folder = \"locales/\"]). In a correct build they are always present. So a None for a known-embedded locale does not mean "the user asked for a locale we don't ship" — it means the embedded data failed to load or parse, i.e. a bug or broken build. The current code cannot tell these two cases apart and treats both as "use English."
How it arose (and why it's worth fixing)
Discovered 2026-06-02 while working on an unrelated feature. The workspace test suite (cargo nextest run --workspace) had stale target/ build artifacts whose env!(\"CARGO_MANIFEST_DIR\") pointed at a deleted git worktree (separate issue — build-cache pollution). For the locale tests, rust-embed's dynamic (debug-mode) loader tried to read locale XML from that dead path, failed, and load_embedded_locale returned None → the manager fell back to English → test_get_quote_config_french failed with Expected French open-quote to contain «, got \"“\".
The key observation: the only reason the silent degradation was caught at all is that a test happened to assert on French content. In production there is no such assertion — a real document would simply render with the wrong language and ship. The silent fallback actively hid a real load failure and made it look like a flaky test.
Proposed fix direction
- Distinguish the two cases in the loader:
- genuinely unshipped locale → fall back to
en-US and emit a DiagnosticMessage warning ("locale xx-YY not available, using en-US").
- embedded locale that fails to load/parse → surface loudly (hard error, or at minimum a logged error), because it indicates a bug/broken build, not a user choice.
- Independently of the above, emit a diagnostic whenever the rendered language differs from the requested language, so silent language substitution can never happen without a trace.
- Stop collapsing UTF-8/parse errors into
None indiscriminately — at least distinguish "not found" from "found but failed to parse."
Tracking
- beads:
bd-apudk
- related (root cause of the discovery): build-cache pollution from worktrees sharing
target/ — beads bd-gpyup
Summary
When a CSL locale fails to load,
quarto-citeprocsilently falls back to English — English terms (and,et al.), English curly quotes, and English date formats — and emits no warning or error. A document that requests French, German, etc. citations can render with English citation furniture and the user is never told anything went wrong.This is a correctness + observability bug: the rendered output is wrong, and there is no signal.
Where it happens
crates/quarto-citeproc/src/locale.rs:load_embedded_locale(lines ~240–266) swallows every failure mode. File-not-found in the embed, non-UTF-8 bytes, and XML parse errors all collapse toNone:load_locale(lines ~41–62) tries the exact tag, then the base language, then returnsNone— silently leaving the manager'slocalesmap empty.get_quote_config(lines ~165–185):.unwrap_or_else(|| \"\u{201C}\".to_string())(English curly quotes), etc.get_term(lines ~79–86): explicitFall back to en-US.Why this is more than a missing-locale convenience
The locale files are compiled into the binary via
rust-embed(#[derive(Embed)] #[folder = \"locales/\"]). In a correct build they are always present. So aNonefor a known-embedded locale does not mean "the user asked for a locale we don't ship" — it means the embedded data failed to load or parse, i.e. a bug or broken build. The current code cannot tell these two cases apart and treats both as "use English."How it arose (and why it's worth fixing)
Discovered 2026-06-02 while working on an unrelated feature. The workspace test suite (
cargo nextest run --workspace) had staletarget/build artifacts whoseenv!(\"CARGO_MANIFEST_DIR\")pointed at a deleted git worktree (separate issue — build-cache pollution). For the locale tests,rust-embed's dynamic (debug-mode) loader tried to read locale XML from that dead path, failed, andload_embedded_localereturnedNone→ the manager fell back to English →test_get_quote_config_frenchfailed withExpected French open-quote to contain «, got \"“\".The key observation: the only reason the silent degradation was caught at all is that a test happened to assert on French content. In production there is no such assertion — a real document would simply render with the wrong language and ship. The silent fallback actively hid a real load failure and made it look like a flaky test.
Proposed fix direction
en-USand emit aDiagnosticMessagewarning ("localexx-YYnot available, using en-US").Noneindiscriminately — at least distinguish "not found" from "found but failed to parse."Tracking
bd-apudktarget/— beadsbd-gpyup