Skip to content

Fix compile_cache_test to assert single jit_train_step cache file#4210

Open
Liauuu wants to merge 1 commit into
AI-Hypercomputer:mainfrom
Liauuu:fix/compile-cache-train-step-count
Open

Fix compile_cache_test to assert single jit_train_step cache file#4210
Liauuu wants to merge 1 commit into
AI-Hypercomputer:mainfrom
Liauuu:fix/compile-cache-train-step-count

Conversation

@Liauuu

@Liauuu Liauuu commented Jun 20, 2026

Copy link
Copy Markdown

Description

Fixes a failure in the nightly cpu-unit job (Build Nightly Docker Images, commit 85690dab).

compile_cache_test.py assumed the JAX persistent compilation cache directory would contain exactly one file total. In nightly CI, two entries are written:

  • jit_train_step-...
  • jit_initialize_state-...

initialize_state is a separate JIT used during model state initialization and is unrelated to the train_step double-compilation regression this test is meant to catch. Requiring a total cache file count of 1 therefore fails even when train_step caching behaves correctly.

This change updates the test to:

  • Assert exactly one cache file whose name starts with jit_train_step
  • Keep the existing log assertion that runtime execution hits the persistent cache for jit_train_step

Other cache entries (e.g. jit_initialize_state) are allowed and no longer cause a false failure.

Context

Related to #4000 (partial — MoE and GPU integration failures in the same nightly run are separate issues).

Error log (before fix)

FAILED tests/unit/compile_cache_test.py::test_train_step_cache_hit - AssertionError: Expected exactly 1 JAX compilation cache file, but found 2: ['jit_train_step-425341b98f4b4ea0715158828c3b68ee2deab5f2df3c74ad475a08c09652d19f-cache', 'jit_initialize_state-68b1de74b0881dd7ff1205f613bc9988724719034daf764e4c2f6893b61098ca-cache']. This indicates a cache miss where AOT compilation and runtime execution generated different keys, causing train_step to be compiled twice (double-compilation regression). assert 2 == 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant