Skip to content

citeproc: locale load failure silently degrades to English citations (no diagnostic) #255

@cscheid

Description

@cscheid

Summary

When a CSL locale fails to load, quarto-citeproc silently falls back to English — English terms (and, et al.), English curly quotes, and English date formats — and emits no warning or error. A document that requests French, German, etc. citations can render with English citation furniture and the user is never told anything went wrong.

This is a correctness + observability bug: the rendered output is wrong, and there is no signal.

Where it happens

crates/quarto-citeproc/src/locale.rs:

  1. load_embedded_locale (lines ~240–266) swallows every failure mode. File-not-found in the embed, non-UTF-8 bytes, and XML parse errors all collapse to None:
    if let Some(file) = LocaleFiles::get(&filename) {
        let xml = std::str::from_utf8(file.data.as_ref()).ok()?;   // parse/utf8 error -> None
        return parse_locale_xml(xml).ok();                          // parse error      -> None
    }
    // ...
    None                                                            // not found        -> None
  2. load_locale (lines ~41–62) tries the exact tag, then the base language, then returns None — silently leaving the manager's locales map empty.
  3. The consumers substitute hardcoded English, again silently:
    • get_quote_config (lines ~165–185): .unwrap_or_else(|| \"\u{201C}\".to_string()) (English curly quotes), etc.
    • get_term (lines ~79–86): explicit Fall back to en-US.

Why this is more than a missing-locale convenience

The locale files are compiled into the binary via rust-embed (#[derive(Embed)] #[folder = \"locales/\"]). In a correct build they are always present. So a None for a known-embedded locale does not mean "the user asked for a locale we don't ship" — it means the embedded data failed to load or parse, i.e. a bug or broken build. The current code cannot tell these two cases apart and treats both as "use English."

How it arose (and why it's worth fixing)

Discovered 2026-06-02 while working on an unrelated feature. The workspace test suite (cargo nextest run --workspace) had stale target/ build artifacts whose env!(\"CARGO_MANIFEST_DIR\") pointed at a deleted git worktree (separate issue — build-cache pollution). For the locale tests, rust-embed's dynamic (debug-mode) loader tried to read locale XML from that dead path, failed, and load_embedded_locale returned None → the manager fell back to English → test_get_quote_config_french failed with Expected French open-quote to contain «, got \"“\".

The key observation: the only reason the silent degradation was caught at all is that a test happened to assert on French content. In production there is no such assertion — a real document would simply render with the wrong language and ship. The silent fallback actively hid a real load failure and made it look like a flaky test.

Proposed fix direction

  • Distinguish the two cases in the loader:
    • genuinely unshipped locale → fall back to en-US and emit a DiagnosticMessage warning ("locale xx-YY not available, using en-US").
    • embedded locale that fails to load/parse → surface loudly (hard error, or at minimum a logged error), because it indicates a bug/broken build, not a user choice.
  • Independently of the above, emit a diagnostic whenever the rendered language differs from the requested language, so silent language substitution can never happen without a trace.
  • Stop collapsing UTF-8/parse errors into None indiscriminately — at least distinguish "not found" from "found but failed to parse."

Tracking

  • beads: bd-apudk
  • related (root cause of the discovery): build-cache pollution from worktrees sharing target/ — beads bd-gpyup

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions