[ROCm]: fix: JAX/TE sharding compatibility and tmem reduction foundations (PR1) by cj401-amd · Pull Request #4191 · AI-Hypercomputer/maxtext

cj401-amd · 2026-06-17T22:51:42Z

Summary

Sharding: filter logical_axis_rules to only include axes present in the mesh,
preventing crashes when fsdp_transpose or other axes are absent
Sharding: add skip_trivial_specs parameter to maybe_shard_with_logical to
skip no-op resharding constraints (all-None PartitionSpecs), reducing XLA overhead
RMSNorm: use reshard() for explicit shard mode; replace jnp.einsum scale
application with direct multiply to avoid unnecessary XLA ops
Attention (TE): for synthetic data, use mask_type="causal" directly instead of
materializing the full [seq, seq] attention mask — avoids ~5 GiB temp memory from
XLA loop_broadcast_fusion hoisting the mask into the pipeline scan carry
Attention (TE): remove deprecated scale_factor and context_parallel_strategy
params from DotProductAttention for newer TransformerEngine compatibility
Train step: skip identity grad_dtype cast when grad_dtype == float32;
set flax_always_shard_variable=False
train_compile.py: handle serialize() API change (tuple vs bytes return type);
pyink formatting
Config: add pipeline_save_decoder_layer_input flag (used by PR 2)

Test plan

python3 -m pytest tests/unit/train_compile_test.py -v -k "test_save_compiled_v5e or test_save_compiled_v4"
Smoke-test training with pipeline parallelism config

…anspose

…o yml can reference it

codecov · 2026-06-18T22:43:48Z

Codecov Report

❌ Patch coverage is 44.82759% with 16 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/attention_op.py	0.00%	7 Missing ⚠️
src/maxtext/layers/normalizations.py	54.54%	4 Missing and 1 partial ⚠️
src/maxtext/trainers/pre_train/train.py	42.85%	3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…r tmem optimizations

cj401-amd requested a review from NuojCheng June 17, 2026 22:51

cj401-amd changed the title ~~[ROCm]: fix: JAX/TE sharding compatibility and tmem reduction foundations~~ [ROCm]: fix: JAX/TE sharding compatibility and tmem reduction foundations (PR1) Jun 17, 2026

cj401-amd added 2 commits June 19, 2026 06:39

fix: JAX/TE compatibility — sharding, reshard, serialize API, fsdp_tr…

81099ec

…anspose

fix: add pipeline_save_decoder_layer_input config field to branch 1 s…

2f5fa82

…o yml can reference it

cj401-amd force-pushed the cj/tmem-fixes-clean-1-jax-sharding-compat branch from 4f116aa to 2f5fa82 Compare June 18, 2026 22:39

cj401-amd added 3 commits June 20, 2026 06:05

Update HLO reference files after tmem fixes

68696cc

test: update HLO references for deepseek3, llama3_8b, qwen3_1.7b afte…

9f97e7e

…r tmem optimizations

Update HLO reference files

a779105

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm]: fix: JAX/TE sharding compatibility and tmem reduction foundations (PR1)#4191

[ROCm]: fix: JAX/TE sharding compatibility and tmem reduction foundations (PR1)#4191
cj401-amd wants to merge 5 commits into
AI-Hypercomputer:mainfrom
cj401-amd:cj/tmem-fixes-clean-1-jax-sharding-compat

cj401-amd commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cj401-amd commented Jun 17, 2026

Summary

Test plan

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 18, 2026 •

edited

Loading