Skip to content

[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558

Open
ajrasane wants to merge 1 commit into
mainfrom
ajrasane/nvbug-5924759-fp16-histogram
Open

[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558
ajrasane wants to merge 1 commit into
mainfrom
ajrasane/nvbug-5924759-fp16-histogram

Conversation

@ajrasane
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane commented May 28, 2026

What does this PR do?

Type of change: Bug fix

INT8 entropy calibration of fp16 ONNX models (e.g. ConvNext / EfficientViT / YOLOv8 backbones quantized via python -m modelopt.onnx.quantization --quantize_mode=int8) used to fail during histogram collection with:

ValueError: Too many bins for data range. Cannot create 128 finite-sized bins.

_collect_value in modelopt/onnx/quantization/ort_patching.py derives threshold = max(abs(min), abs(max)) from the activation tensor and passes range=(-threshold, threshold) to np.histogram(...). When the model is fp16 and a calibrated activation has a small range (≲ 1e-5), both endpoints inherit fp16 dtype. Under numpy 2.0's NEP-50 strict promotion, the resulting fp16 linspace collapses consecutive 128-bin edges to the same value and numpy refuses to build the histogram. numpy 1.x silently used higher-precision intermediate dtype, masking the issue.

The fix casts the range endpoints to Python float so numpy computes bin edges in float64 regardless of input dtype. Applied at both call sites: _collect_value and the single-node variant _collect_value_histogram_collector_single_node_calibration.

Usage

# The affected workflow — INT8 entropy calibration of any fp16 ONNX model:
python -m modelopt.onnx.quantization \
    --quantize_mode=int8 \
    --onnx_path=model.fp16.onnx \
    --calibration_data_path=calib.npy

No API change.

Testing

  • Added test_collect_value_fp16_narrow_range in tests/gpu/onnx/test_ort_patching.py that calls _collect_value with a fp16 tensor (mostly zeros + one ~1e-5 value) and asserts the histogram is built without raising and all bin edges are distinct. Fails on the buggy code, passes after the fix.
  • Reproduced the original failure on numpy 2.2.6 before the fix.
  • Full tests/gpu/onnx/test_ort_patching.py suite (31 tests) passes.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • Bug Fixes

    • Fixed INT8 entropy calibration for fp16 ONNX models failing with NumPy >= 2.0. Histogram range computation now correctly handles fp16 activations with small dynamic ranges.
  • Tests

    • Added test coverage for INT8 calibration with fp16 activations using narrow value ranges.

Review Change Stack

_collect_value derives threshold from activation min/max. For fp16
activations with a small range, both np.histogram range endpoints
inherit fp16 dtype. Under numpy 2.0 NEP-50 promotion the resulting
fp16 linspace collapses 128 bin edges and numpy raises
"Too many bins for data range". Cast range to Python float so bin
edges are computed in float64. Same fix applied to the single-node
calibration variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane requested a review from a team as a code owner May 28, 2026 15:49
@ajrasane ajrasane requested a review from vishalpandya1990 May 28, 2026 15:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR fixes INT8 entropy calibration failures on fp16 ONNX models running with NumPy >= 2.0. The fix casts histogram range endpoints to Python floats in two histogram collection code paths to ensure proper bin-edge computation, includes a regression test for fp16 narrow-range inputs, and documents the issue in release notes.

Changes

NumPy 2.0 histogram compatibility

Layer / File(s) Summary
Histogram range endpoint float casting
modelopt/onnx/quantization/ort_patching.py
Real-value and single-node histogram collection paths both cast the computed symmetric range endpoint to Python float before passing to np.histogram, preventing bin-edge collapse caused by inheriting narrow fp16 dtype precision.
FP16 narrow-range histogram regression test
tests/gpu/onnx/test_ort_patching.py
New test test_collect_value_fp16_narrow_range validates histogram collection on FP16 activations with extremely small dynamic range, confirming bin edges do not collapse, histogram counts sum correctly, and edge count matches expected value.
Release notes documentation
CHANGELOG.rst
Bug-fix entry documents the INT8 entropy calibration failure on NumPy >= 2.0 with fp16 ONNX models and references the histogram range endpoint casting fix in ort_patching.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main fix: addressing a NumPy >= 2.0 compatibility issue with fp16 ONNX INT8 entropy calibration, which matches the core changes across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR has no security anti-patterns: no unsafe deserialization, eval/exec, hardcoded trust flags, or # nosec bypasses. Changes safely cast fp16 threshold with inline comments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/nvbug-5924759-fp16-histogram

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/gpu/onnx/test_ort_patching.py (1)

156-170: ⚡ Quick win

Extend regression coverage to the single-node histogram path too.

This test validates _collect_value, but the same float-cast fix was also applied to _collect_value_histogram_collector_single_node_calibration. Consider parameterizing this test (or adding a sibling test) to assert identical fp16 narrow-range behavior there as well.

As per coding guidelines, “Use pytest for new/updated tests; keep tests lean and add coverage that guards the expected behavior/regression.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/gpu/onnx/test_ort_patching.py` around lines 156 - 170, Add the same
fp16 narrow-range assertion for the single-node histogram path by running the
existing assertions against the alternative collector entry point
_collect_value_histogram_collector_single_node_calibration (either by
parameterizing test_collect_value_fp16_narrow_range with both _collect_value and
_collect_value_histogram_collector_single_node_calibration or by adding a
sibling test). Ensure you invoke the same setup (activations, name_to_arr,
mock_histogram_collector), call the single-node function, then assert hist.sum()
equals activations.size, len(edges) equals mock_histogram_collector.num_bins +
1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the
original test’s checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/gpu/onnx/test_ort_patching.py`:
- Around line 156-170: Add the same fp16 narrow-range assertion for the
single-node histogram path by running the existing assertions against the
alternative collector entry point
_collect_value_histogram_collector_single_node_calibration (either by
parameterizing test_collect_value_fp16_narrow_range with both _collect_value and
_collect_value_histogram_collector_single_node_calibration or by adding a
sibling test). Ensure you invoke the same setup (activations, name_to_arr,
mock_histogram_collector), call the single-node function, then assert hist.sum()
equals activations.size, len(edges) equals mock_histogram_collector.num_bins +
1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the
original test’s checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 69514b50-b834-40bf-8abf-257dc247b070

📥 Commits

Reviewing files that changed from the base of the PR and between b49f9b9 and c4ceddc.

📒 Files selected for processing (3)
  • CHANGELOG.rst
  • modelopt/onnx/quantization/ort_patching.py
  • tests/gpu/onnx/test_ort_patching.py

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1558/

Built to branch gh-pages at 2026-05-28 15:53 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Small, well-scoped bug fix for fp16 ONNX INT8 entropy calibration on numpy >= 2.0. The root-cause analysis (NEP-50 strict promotion causing fp16 linspace bin-edge collapse) is correct, the fix (casting threshold to Python float so np.histogram computes edges in float64) is minimal and applied at both relevant call sites in ort_patching.py. The new test_collect_value_fp16_narrow_range reproduces the failure mode and asserts no bin-edge collapse. CHANGELOG is updated. No API change, no licensing impact.

@ajrasane ajrasane enabled auto-merge (squash) May 28, 2026 16:00
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.45%. Comparing base (b49f9b9) to head (c4ceddc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1558      +/-   ##
==========================================
+ Coverage   69.43%   69.45%   +0.02%     
==========================================
  Files         477      477              
  Lines       51977    51979       +2     
==========================================
+ Hits        36090    36104      +14     
+ Misses      15887    15875      -12     
Flag Coverage Δ
examples 33.65% <0.00%> (+0.82%) ⬆️
gpu 50.99% <100.00%> (-0.56%) ⬇️
unit 52.76% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

range_max = float(threshold)
hist, hist_edges = np.histogram(
data_arr, histogram_collector.num_bins, range=(-threshold, threshold)
data_arr, histogram_collector.num_bins, range=(-range_max, range_max)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a duplicate from _collect_value(). Can we merge this into a separate util function?

Copy link
Copy Markdown
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after minor comment is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants