[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0 by ajrasane · Pull Request #1558 · NVIDIA/Model-Optimizer

ajrasane · 2026-05-28T15:49:17Z

What does this PR do?

Type of change: Bug fix

INT8 entropy calibration of fp16 ONNX models (e.g. ConvNext / EfficientViT / YOLOv8 backbones quantized via python -m modelopt.onnx.quantization --quantize_mode=int8) used to fail during histogram collection with:

ValueError: Too many bins for data range. Cannot create 128 finite-sized bins.

_collect_value in modelopt/onnx/quantization/ort_patching.py derives threshold = max(abs(min), abs(max)) from the activation tensor and passes range=(-threshold, threshold) to np.histogram(...). When the model is fp16 and a calibrated activation has a small range (≲ 1e-5), both endpoints inherit fp16 dtype. Under numpy 2.0's NEP-50 strict promotion, the resulting fp16 linspace collapses consecutive 128-bin edges to the same value and numpy refuses to build the histogram. numpy 1.x silently used higher-precision intermediate dtype, masking the issue.

The fix casts the range endpoints to Python float so numpy computes bin edges in float64 regardless of input dtype. Applied at both call sites: _collect_value and the single-node variant _collect_value_histogram_collector_single_node_calibration.

Usage

# The affected workflow — INT8 entropy calibration of any fp16 ONNX model:
python -m modelopt.onnx.quantization \
    --quantize_mode=int8 \
    --onnx_path=model.fp16.onnx \
    --calibration_data_path=calib.npy

No API change.

Testing

Added test_collect_value_fp16_narrow_range in tests/gpu/onnx/test_ort_patching.py that calls _collect_value with a fp16 tensor (mostly zeros + one ~1e-5 value) and asserts the histogram is built without raising and all bin edges are distinct. Fails on the buggy code, passes after the fix.
Reproduced the original failure on numpy 2.2.6 before the fix.
Full tests/gpu/onnx/test_ort_patching.py suite (31 tests) passes.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅

Summary by CodeRabbit

Bug Fixes
- Fixed INT8 entropy calibration for fp16 ONNX models failing with NumPy >= 2.0. Histogram range computation now correctly handles fp16 activations with small dynamic ranges.
Tests
- Added test coverage for INT8 calibration with fp16 activations using narrow value ranges.

_collect_value derives threshold from activation min/max. For fp16 activations with a small range, both np.histogram range endpoints inherit fp16 dtype. Under numpy 2.0 NEP-50 promotion the resulting fp16 linspace collapses 128 bin edges and numpy raises "Too many bins for data range". Cast range to Python float so bin edges are computed in float64. Same fix applied to the single-node calibration variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

coderabbitai · 2026-05-28T15:49:33Z

📝 Walkthrough

Walkthrough

This PR fixes INT8 entropy calibration failures on fp16 ONNX models running with NumPy >= 2.0. The fix casts histogram range endpoints to Python floats in two histogram collection code paths to ensure proper bin-edge computation, includes a regression test for fp16 narrow-range inputs, and documents the issue in release notes.

Changes

NumPy 2.0 histogram compatibility

Layer / File(s)	Summary
Histogram range endpoint float casting `modelopt/onnx/quantization/ort_patching.py`	Real-value and single-node histogram collection paths both cast the computed symmetric range endpoint to Python `float` before passing to `np.histogram`, preventing bin-edge collapse caused by inheriting narrow fp16 dtype precision.
FP16 narrow-range histogram regression test `tests/gpu/onnx/test_ort_patching.py`	New test `test_collect_value_fp16_narrow_range` validates histogram collection on FP16 activations with extremely small dynamic range, confirming bin edges do not collapse, histogram counts sum correctly, and edge count matches expected value.
Release notes documentation `CHANGELOG.rst`	Bug-fix entry documents the INT8 entropy calibration failure on NumPy >= 2.0 with fp16 ONNX models and references the histogram range endpoint casting fix in `ort_patching`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main fix: addressing a NumPy >= 2.0 compatibility issue with fp16 ONNX INT8 entropy calibration, which matches the core changes across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR has no security anti-patterns: no unsafe deserialization, eval/exec, hardcoded trust flags, or # nosec bypasses. Changes safely cast fp16 threshold with inline comments.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ajrasane/nvbug-5924759-fp16-histogram

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/gpu/onnx/test_ort_patching.py (1)
156-170: ⚡ Quick win

Extend regression coverage to the single-node histogram path too.

This test validates _collect_value, but the same float-cast fix was also applied to _collect_value_histogram_collector_single_node_calibration. Consider parameterizing this test (or adding a sibling test) to assert identical fp16 narrow-range behavior there as well.

As per coding guidelines, “Use pytest for new/updated tests; keep tests lean and add coverage that guards the expected behavior/regression.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/gpu/onnx/test_ort_patching.py` around lines 156 - 170, Add the same
fp16 narrow-range assertion for the single-node histogram path by running the
existing assertions against the alternative collector entry point
_collect_value_histogram_collector_single_node_calibration (either by
parameterizing test_collect_value_fp16_narrow_range with both _collect_value and
_collect_value_histogram_collector_single_node_calibration or by adding a
sibling test). Ensure you invoke the same setup (activations, name_to_arr,
mock_histogram_collector), call the single-node function, then assert hist.sum()
equals activations.size, len(edges) equals mock_histogram_collector.num_bins +
1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the
original test’s checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/gpu/onnx/test_ort_patching.py`:
- Around line 156-170: Add the same fp16 narrow-range assertion for the
single-node histogram path by running the existing assertions against the
alternative collector entry point
_collect_value_histogram_collector_single_node_calibration (either by
parameterizing test_collect_value_fp16_narrow_range with both _collect_value and
_collect_value_histogram_collector_single_node_calibration or by adding a
sibling test). Ensure you invoke the same setup (activations, name_to_arr,
mock_histogram_collector), call the single-node function, then assert hist.sum()
equals activations.size, len(edges) equals mock_histogram_collector.num_bins +
1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the
original test’s checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 69514b50-b834-40bf-8abf-257dc247b070

📥 Commits

Reviewing files that changed from the base of the PR and between b49f9b9 and c4ceddc.

📒 Files selected for processing (3)

CHANGELOG.rst
modelopt/onnx/quantization/ort_patching.py
tests/gpu/onnx/test_ort_patching.py

github-actions · 2026-05-28T15:53:33Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1558/
Built to branch `gh-pages` at 2026-05-28 15:53 UTC. Preview will be ready when the GitHub Pages deployment is complete.

cjluo-nv

Bot review — DM the bot to share feedback.

Small, well-scoped bug fix for fp16 ONNX INT8 entropy calibration on numpy >= 2.0. The root-cause analysis (NEP-50 strict promotion causing fp16 linspace bin-edge collapse) is correct, the fix (casting threshold to Python float so np.histogram computes edges in float64) is minimal and applied at both relevant call sites in ort_patching.py. The new test_collect_value_fp16_narrow_range reproduces the failure mode and asserts no bin-edge collapse. CHANGELOG is updated. No API change, no licensing impact.

codecov · 2026-05-28T16:02:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.45%. Comparing base (b49f9b9) to head (c4ceddc).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1558      +/-   ##
==========================================
+ Coverage   69.43%   69.45%   +0.02%     
==========================================
  Files         477      477              
  Lines       51977    51979       +2     
==========================================
+ Hits        36090    36104      +14     
+ Misses      15887    15875      -12

Flag	Coverage Δ
examples	`33.65% <0.00%> (+0.82%)`	⬆️
gpu	`50.99% <100.00%> (-0.56%)`	⬇️
unit	`52.76% <50.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcunhase · 2026-05-28T17:49:31Z

+            range_max = float(threshold)
            hist, hist_edges = np.histogram(
-                data_arr, histogram_collector.num_bins, range=(-threshold, threshold)
+                data_arr, histogram_collector.num_bins, range=(-range_max, range_max)


This code is a duplicate from _collect_value(). Can we merge this into a separate util function?

gcunhase

LGTM after minor comment is addressed.

ajrasane requested a review from a team as a code owner May 28, 2026 15:49

ajrasane requested a review from vishalpandya1990 May 28, 2026 15:49

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

coderabbitai Bot approved these changes May 28, 2026

View reviewed changes

cjluo-nv approved these changes May 28, 2026

View reviewed changes

ajrasane enabled auto-merge (squash) May 28, 2026 16:00

gcunhase reviewed May 28, 2026

View reviewed changes

gcunhase approved these changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558

[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558
ajrasane wants to merge 1 commit into
mainfrom
ajrasane/nvbug-5924759-fp16-histogram

ajrasane commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch `gh-pages` at 2026-05-28 15:53 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

cjluo-nv left a comment

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

gcunhase May 28, 2026

Uh oh!

gcunhase left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ajrasane commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch gh-pages at 2026-05-28 15:53 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gcunhase May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gcunhase left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajrasane commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-28 15:53 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 28, 2026 •

edited

Loading