Skip to content

Fix standalone month alias cache leakage#1263

Open
BlocksecPHD wants to merge 1 commit intopython-babel:masterfrom
BlocksecPHD:fix-hebrew-standalone-month-cache
Open

Fix standalone month alias cache leakage#1263
BlocksecPHD wants to merge 1 commit intopython-babel:masterfrom
BlocksecPHD:fix-hebrew-standalone-month-cache

Conversation

@BlocksecPHD
Copy link
Copy Markdown

Problem background

Using babel.dates.format_date(..., 'LLLL', locale='he') could corrupt later standalone month-name lookups for other locales in the same process. After Hebrew resolved months['stand-alone']['wide'], subsequent LLLL formatting for locales such as no, fr, and de could incorrectly return Hebrew month names.

The underlying issue was in LocaleDataDict.__getitem__(): when an alias resolved to a nested dict, the resolved LocaleDataDict wrapper was memoized back into the underlying cached locale data. That leaked alias resolution state across locales through the shared locale-data cache.

Changes summary

  • Stop memoizing dict-backed alias resolutions back into the underlying locale cache in LocaleDataDict.__getitem__().
  • Add a regression test covering LLLL formatting after Hebrew for de, no, and fr, while preserving the expected Hebrew result itself.

Test results

  • Reproduced the bug on v2.16.0 with:
    • de -> Oktober
    • he -> אוקטובר
    • no -> אוקטובר (wrong before fix)
    • fr -> אוקטובר (wrong before fix)
  • Verified the fix locally on current master:
    • de -> Oktober
    • he -> אוקטובר
    • no -> oktober
    • fr -> octobre
  • Test suite run:
    • pytest tests/test_dates.py -k 'standalone_month_name_isolation or test_format_date' -q
    • pytest tests/test_localedata.py tests/test_core.py tests/test_dates.py -q
    • Result: 3441 passed, 1 skipped, 2 xfailed

Risk assessment

Low risk. The change is narrowly scoped to avoid caching LocaleDataDict wrappers for nested dict values. Non-dict scalar alias resolutions are still memoized as before, and the added regression test specifically covers the cross-locale contamination case.

Comment on lines 271 to +272
if isinstance(val, dict): # Return a nested alias-resolving dict
val = LocaleDataDict(val, base=self.base)
return LocaleDataDict(val, base=self.base)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This line has been the same ever since this code was added in 4a6c0f5 in 2008. It's curious that this bug has lurked in here for nearly two decades? I wonder how/if this will affect performance, since we'd no longer be caching this LocaleDataDict below?

Further, from the PR description:

the resolved LocaleDataDict wrapper was memoized back into the underlying cached locale data

Why wouldn't that be that locale's cached locale data? What's the mechanism for it to be in, presumably, another locale's cached data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants