fix(quant): sync NVFP4StaticQuantizer global_amax across TP and EP by cjluo-nv · Pull Request #1553 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-05-28T06:50:27Z

What does this PR do?

Type of change: Bug fix

Background. During max_calibrate, promote_nvfp4_static_quantizers(model) is called before the distributed-sync block. Promotion registers _global_amax on each NVFP4StaticQuantizer from reduce_amax(local _amax, axis=None) — i.e. the max over this rank's per-block _amax. With TP (weight sharded across Cout or Cin) or MoE EP (different experts per rank), each rank's local _amax covers a different slice of the global weight, so _global_amax diverges across ranks.

The existing sync block only all-reduces _amax:

sync_quantizer_amax_across_dp_ep syncs _amax across DP/EP.
sync_quantizer_amax_across_tp syncs _amax across TP.

Nothing in the pipeline touches _global_amax after promotion, so the upper level of the two-level NVFP4 scale ends up TP/EP-layout-dependent. This violates the stated invariant on model_calib.py:303 ("TP=8 ↔ TP=4 ↔ TP=8 should give the same quantization parameters") and produces inconsistent block-scale FP8 grids across MoE experts when EP > 1.

Fix. Add NVFP4StaticQuantizer.sync_global_amax_across_distributed_group mirroring TensorQuantizer.sync_amax_across_distributed_group, then call it from both sync_quantizer_amax_across_dp_ep (EP group) and sync_quantizer_amax_across_tp (TP group), right after the existing _amax sync. DP is omitted on purpose: weights are replicated across DP, so each rank already has the same local _amax and therefore the same locally-computed _global_amax.

Usage

No API change. The existing calibration flow now produces consistent _global_amax across TP/EP automatically:

import modelopt.torch.quantization as mtq
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)
# Under TP and/or EP, every rank now holds the same _global_amax on
# each NVFP4StaticQuantizer after max_calibrate / mse_calibrate.

Testing

pre-commit run --files modelopt/torch/quantization/model_calib.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py — passes (ruff / mypy / format / bandit clean).
pytest tests/unit/torch/quantization/test_calib.py -q — 12 passed, 1 skipped (CUDA-only).
Multi-rank verification: deferred (draft). Planning to add a dist-launched unit test that runs max_calibrate under fake TP and EP groups and asserts _global_amax is bit-identical across ranks for NVFP4-static weight quantizers.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ — purely additive; only behavior change is that _global_amax is now consistent across TP/EP ranks, which was the documented intent.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌ — draft; will add multi-rank test before flipping to ready-for-review.
Did you update Changelog?: ❌ — will add a bug-fix entry before flipping to ready-for-review.
Did you get Claude approval on this PR?: N/A — draft.

Additional Information

Related: #1536 (in-flight refactor of static-NVFP4 finalization). That PR consolidates promote_nvfp4_static_quantizers + _sync_grouped_weight_global_amax into max_calibrate but does not change the promote-vs-TP/EP-sync ordering and does not add a cross-TP/EP _global_amax reduction — _sync_grouped_weight_global_amax only unifies grouped Q/K/V/gate/up siblings within a rank. This PR layers cleanly on top of #1536 or stands alone on main.

🤖 Generated with Claude Code

`promote_nvfp4_static_quantizers` registers `_global_amax` from `reduce_amax(local _amax)` before the distributed-sync block in `max_calibrate`. With TP (different weight shards per rank) or MoE EP (different experts per rank), each rank's local `_amax` covers a different subset of the global weight, so the resulting `_global_amax` diverges across ranks. The existing TP/EP sync only touches `_amax`, leaving `_global_amax` permanently rank-local — making the upper level of the two-level NVFP4 scale TP/EP-layout-dependent. Add `NVFP4StaticQuantizer.sync_global_amax_across_distributed_group` (mirroring the existing `sync_amax_across_distributed_group` on `TensorQuantizer`) and call it from both `sync_quantizer_amax_across_dp_ep` (EP group) and `sync_quantizer_amax_across_tp` (TP group) alongside the existing `_amax` sync. DP doesn't need a separate call because weights are replicated across DP, so `_global_amax` is naturally identical. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

copy-pr-bot · 2026-05-28T06:50:31Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-28T06:50:34Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 16f206ab-1dc4-43eb-9e52-ec79534a5d67

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nvfp4-static-global-amax-tp-ep-sync

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T06:54:56Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1553/
Built to branch `gh-pages` at 2026-05-28 06:54 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-05-28T07:03:12Z

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.96%. Comparing base (a5bc6f8) to head (a03db8e).
⚠️ Report is 18 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	25.00%	3 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1553      +/-   ##
==========================================
- Coverage   76.79%   75.96%   -0.83%     
==========================================
  Files         474      477       +3     
  Lines       51560    53862    +2302     
==========================================
+ Hits        39593    40916    +1323     
- Misses      11967    12946     +979

Flag	Coverage Δ
unit	`52.75% <50.00%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quant): sync NVFP4StaticQuantizer global_amax across TP and EP#1553

fix(quant): sync NVFP4StaticQuantizer global_amax across TP and EP#1553
cjluo-nv wants to merge 1 commit into
mainfrom
fix/nvfp4-static-global-amax-tp-ep-sync

cjluo-nv commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Review skipped

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch `gh-pages` at 2026-05-28 06:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjluo-nv commented May 28, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Review skipped

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch gh-pages at 2026-05-28 06:54 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented May 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Built to branch `gh-pages` at 2026-05-28 06:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.