[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558
[5924759] Fix fp16 ONNX INT8 entropy calibration on numpy >= 2.0#1558ajrasane wants to merge 1 commit into
Conversation
_collect_value derives threshold from activation min/max. For fp16 activations with a small range, both np.histogram range endpoints inherit fp16 dtype. Under numpy 2.0 NEP-50 promotion the resulting fp16 linspace collapses 128 bin edges and numpy raises "Too many bins for data range". Cast range to Python float so bin edges are computed in float64. Same fix applied to the single-node calibration variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR fixes INT8 entropy calibration failures on fp16 ONNX models running with NumPy >= 2.0. The fix casts histogram range endpoints to Python floats in two histogram collection code paths to ensure proper bin-edge computation, includes a regression test for fp16 narrow-range inputs, and documents the issue in release notes. ChangesNumPy 2.0 histogram compatibility
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/gpu/onnx/test_ort_patching.py (1)
156-170: ⚡ Quick winExtend regression coverage to the single-node histogram path too.
This test validates
_collect_value, but the same float-cast fix was also applied to_collect_value_histogram_collector_single_node_calibration. Consider parameterizing this test (or adding a sibling test) to assert identical fp16 narrow-range behavior there as well.As per coding guidelines, “Use pytest for new/updated tests; keep tests lean and add coverage that guards the expected behavior/regression.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/gpu/onnx/test_ort_patching.py` around lines 156 - 170, Add the same fp16 narrow-range assertion for the single-node histogram path by running the existing assertions against the alternative collector entry point _collect_value_histogram_collector_single_node_calibration (either by parameterizing test_collect_value_fp16_narrow_range with both _collect_value and _collect_value_histogram_collector_single_node_calibration or by adding a sibling test). Ensure you invoke the same setup (activations, name_to_arr, mock_histogram_collector), call the single-node function, then assert hist.sum() equals activations.size, len(edges) equals mock_histogram_collector.num_bins + 1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the original test’s checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/gpu/onnx/test_ort_patching.py`:
- Around line 156-170: Add the same fp16 narrow-range assertion for the
single-node histogram path by running the existing assertions against the
alternative collector entry point
_collect_value_histogram_collector_single_node_calibration (either by
parameterizing test_collect_value_fp16_narrow_range with both _collect_value and
_collect_value_histogram_collector_single_node_calibration or by adding a
sibling test). Ensure you invoke the same setup (activations, name_to_arr,
mock_histogram_collector), call the single-node function, then assert hist.sum()
equals activations.size, len(edges) equals mock_histogram_collector.num_bins +
1, and no zero-width bins (not np.any(np.diff(edges) == 0)) to mirror the
original test’s checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 69514b50-b834-40bf-8abf-257dc247b070
📒 Files selected for processing (3)
CHANGELOG.rstmodelopt/onnx/quantization/ort_patching.pytests/gpu/onnx/test_ort_patching.py
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small, well-scoped bug fix for fp16 ONNX INT8 entropy calibration on numpy >= 2.0. The root-cause analysis (NEP-50 strict promotion causing fp16 linspace bin-edge collapse) is correct, the fix (casting threshold to Python float so np.histogram computes edges in float64) is minimal and applied at both relevant call sites in ort_patching.py. The new test_collect_value_fp16_narrow_range reproduces the failure mode and asserts no bin-edge collapse. CHANGELOG is updated. No API change, no licensing impact.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1558 +/- ##
==========================================
+ Coverage 69.43% 69.45% +0.02%
==========================================
Files 477 477
Lines 51977 51979 +2
==========================================
+ Hits 36090 36104 +14
+ Misses 15887 15875 -12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| range_max = float(threshold) | ||
| hist, hist_edges = np.histogram( | ||
| data_arr, histogram_collector.num_bins, range=(-threshold, threshold) | ||
| data_arr, histogram_collector.num_bins, range=(-range_max, range_max) |
There was a problem hiding this comment.
This code is a duplicate from _collect_value(). Can we merge this into a separate util function?
gcunhase
left a comment
There was a problem hiding this comment.
LGTM after minor comment is addressed.
What does this PR do?
Type of change: Bug fix
INT8 entropy calibration of fp16 ONNX models (e.g. ConvNext / EfficientViT / YOLOv8 backbones quantized via
python -m modelopt.onnx.quantization --quantize_mode=int8) used to fail during histogram collection with:_collect_valueinmodelopt/onnx/quantization/ort_patching.pyderivesthreshold = max(abs(min), abs(max))from the activation tensor and passesrange=(-threshold, threshold)tonp.histogram(...). When the model is fp16 and a calibrated activation has a small range (≲ 1e-5), both endpoints inherit fp16 dtype. Under numpy 2.0's NEP-50 strict promotion, the resulting fp16linspacecollapses consecutive 128-bin edges to the same value and numpy refuses to build the histogram. numpy 1.x silently used higher-precision intermediate dtype, masking the issue.The fix casts the range endpoints to Python
floatso numpy computes bin edges in float64 regardless of input dtype. Applied at both call sites:_collect_valueand the single-node variant_collect_value_histogram_collector_single_node_calibration.Usage
# The affected workflow — INT8 entropy calibration of any fp16 ONNX model: python -m modelopt.onnx.quantization \ --quantize_mode=int8 \ --onnx_path=model.fp16.onnx \ --calibration_data_path=calib.npyNo API change.
Testing
test_collect_value_fp16_narrow_rangeintests/gpu/onnx/test_ort_patching.pythat calls_collect_valuewith a fp16 tensor (mostly zeros + one ~1e-5 value) and asserts the histogram is built without raising and all bin edges are distinct. Fails on the buggy code, passes after the fix.tests/gpu/onnx/test_ort_patching.pysuite (31 tests) passes.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit
Bug Fixes
Tests