Autoquant and GPTQ in support in Megatron-Core [OMNIML-3151] by jenchen13 · Pull Request #1562 · NVIDIA/Model-Optimizer

jenchen13 · 2026-05-28T21:30:02Z

What does this PR do?

Type of change: New Feature

Autoquant and GPTQ in support in Megatron-Core

Usage

# Add a code snippet demonstrating how to use this

Testing

TODO add a test for GPTQ mcore

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Enhanced distributed auto-quantization with expert model parallelism support across all parallel groups
- Integrated Megatron framework compatibility for auto-quantization
- Added block quantization detection capabilities
- Introduced GPTQ layer-wise calibration support
Bug Fixes
- Fixed zero-sized input handling in calibration to prevent errors
Tests
- Added distributed quantization tests for expert parallelism configurations

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai · 2026-05-28T21:31:00Z

📝 Walkthrough

Walkthrough

AutoQuantize internals are updated to consistently incorporate expert model parallelism in distributed synchronization, refactor weight-size computation to derive from candidate statistics, introduce Megatron-specific auto-quantization support with lazy plugin registration, add block quantization detection properties, include a calibration zero-input guard, and add distributed and unit tests.

Changes

AutoQuantize Expert Model Parallelism and Megatron Support

Layer / File(s)	Summary
EP group inclusion in distributed synchronization `modelopt/torch/quantization/algorithms.py`	Score and cost reductions now include `expert_model_parallel_group` alongside tensor/data parallel groups. Final format selection synchronizes across DP/TP/EP instead of DP/TP only. Extended quant grouping regex for NemotronH MCore "local_experts" fused linear naming.
Weight size computation refactoring `modelopt/torch/quantization/algorithms.py`	Introduced `_get_total_weight_size_from_candidate_stats()` to compute total no-quant weight from candidate statistics. Replaced direct module weight summation in `run_search()` and `_resolve_best_recipe()` with the new candidate-stats method.
Megatron auto-quantization plugin support `modelopt/torch/quantization/model_quant.py`, `modelopt/torch/quantization/plugins/megatron.py`	`auto_quantize()` lazily imports and registers Megatron support. Plugin defines module predicates, gradient-checkpoint context manager, weight parameter selection, and `register_megatron_autoquant_support()` for one-time registration with `AutoQuantizeGradientSearcher`. Adds GPTQ layerwise calibration via `get_mcore_decoder_layers()`.
Block quantization detection properties `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`	TensorQuantizer exposes `is_block_quant` property, adds `is_dynamic_block_quant` for dynamic block mode, and refactors `is_static_block_quant` to reuse the new base property.
Calibration zero-input guard `modelopt/torch/quantization/utils/calib_utils.py`	`update_hessian()` now returns early when flattened input is zero-sized, preventing division-by-zero in subsequent update logic.
Distributed and unit tests `tests/gpu_megatron/torch/quantization/plugins/test_megatron.py`, `tests/unit/torch/quantization/test_autoquant.py`	Added distributed test for MoE with expert parallelism verifying consistent recipe across EP ranks. Added unit test confirming weight budget derives from candidate statistics rather than module scanning.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

ChenhanYu

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title mentions Autoquant and GPTQ support for Megatron-Core, which aligns with the core changes across multiple files implementing this functionality.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found. All files pass SECURITY.md checks: no unsafe torch.load, numpy.load, hardcoded trust_remote_code, dangerous eval/exec, `#nosec` comments, or new dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jennifchen/mcore_autoquant_gptq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T21:35:00Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1562/
Built to branch `gh-pages` at 2026-05-28 21:34 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/utils/calib_utils.py (1)

60-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the docstring note to reflect the new behavior.

The note states that "input must be non-empty" and "a zero-sized input causes division by zero", but the new guard clause at lines 66-67 now handles batch_size == 0 gracefully. Update the docstring to reflect that empty inputs are now supported.

📝 Proposed docstring update

-    Note: input must be non-empty (batch_size > 0); a zero-sized input causes division by zero.
+    Note: Empty inputs (batch_size == 0) are handled gracefully and return unchanged hessian/n_samples.
+          This can occur in MoE models when some experts receive no tokens.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/utils/calib_utils.py` around lines 60 - 61,
Update the docstring Note to reflect that empty inputs are now supported:
replace "input must be non-empty (batch_size > 0); a zero-sized input causes
division by zero" with a sentence stating that the function now handles
batch_size == 0 via the guard clause (which returns early when batch_size == 0)
and will not raise a division-by-zero error; mention that non-empty inputs are
still processed normally. Target the docstring for the function that contains
the guard checking batch_size == 0 (the docstring immediately above that guard)
and keep the wording brief and clear.

🧹 Nitpick comments (2)

modelopt/torch/quantization/plugins/megatron.py (1)

810-837: ⚡ Quick win

Document and export the newly added public APIs.

register_megatron_autoquant_support and get_mcore_decoder_layers are public (non-underscore) but only one has a docstring, and neither is reflected in __all__.

As per coding guidelines, "Document public APIs with docstrings, including examples when useful" and "Define the public API with __all__ at the top of each module".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/plugins/megatron.py` around lines 810 - 837, Add
a docstring to the newly public function get_mcore_decoder_layers describing
purpose, parameters, return type and an example, and ensure
register_megatron_autoquant_support also has appropriate public-docstring
coverage if needed; then export both symbols by adding
"register_megatron_autoquant_support" and "get_mcore_decoder_layers" to the
module's __all__ list at the top of the file so they are part of the public API
surface.

modelopt/torch/quantization/model_quant.py (1)

510-515: ⚡ Quick win

Don’t silently swallow plugin import failures.

Line 514 currently suppresses all ImportErrors, which can hide real regressions and make Megatron auto-quant support silently disappear. Emit a warning (or gate the exception type more narrowly) so failures are diagnosable.

Proposed change

     try:
         from .plugins.megatron import register_megatron_autoquant_support

         register_megatron_autoquant_support()
-    except ImportError:
-        pass
+    except ImportError as exc:
+        warnings.warn(
+            f"Skipping Megatron auto-quant support registration due to import error: {exc}",
+            RuntimeWarning,
+            stacklevel=2,
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/model_quant.py` around lines 510 - 515, The
current try/except around importing and calling
register_megatron_autoquant_support silently swallows ImportError; update the
block to either catch a more specific exception (e.g., ModuleNotFoundError for
the plugin import) or log a warning when import/call fails so failures are
visible; specifically wrap the import and call to
register_megatron_autoquant_support() and on failure call the module's logger or
warnings.warn/processLogger.warning with a clear message including the exception
text and that Megatron auto-quant support is disabled.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/plugins/megatron.py`:
- Around line 830-831: get_mcore_decoder_layers is mutating model.decoder.layers
by appending model.output_layer which causes duplicated entries on repeated
calls; instead return a new nn.ModuleList (e.g., copy model.decoder.layers into
a fresh list/ModuleList) and append the output_layer to that new collection or
check for existence before appending so augmentation is idempotent; update
get_mcore_decoder_layers (and calls from
LayerActivationCollector.get_decoder_layers /
LayerActivationCollector._patch_all_layers) to use the non-mutating copy so
_cleanup_layers need not undo permanent changes.

---

Outside diff comments:
In `@modelopt/torch/quantization/utils/calib_utils.py`:
- Around line 60-61: Update the docstring Note to reflect that empty inputs are
now supported: replace "input must be non-empty (batch_size > 0); a zero-sized
input causes division by zero" with a sentence stating that the function now
handles batch_size == 0 via the guard clause (which returns early when
batch_size == 0) and will not raise a division-by-zero error; mention that
non-empty inputs are still processed normally. Target the docstring for the
function that contains the guard checking batch_size == 0 (the docstring
immediately above that guard) and keep the wording brief and clear.

---

Nitpick comments:
In `@modelopt/torch/quantization/model_quant.py`:
- Around line 510-515: The current try/except around importing and calling
register_megatron_autoquant_support silently swallows ImportError; update the
block to either catch a more specific exception (e.g., ModuleNotFoundError for
the plugin import) or log a warning when import/call fails so failures are
visible; specifically wrap the import and call to
register_megatron_autoquant_support() and on failure call the module's logger or
warnings.warn/processLogger.warning with a clear message including the exception
text and that Megatron auto-quant support is disabled.

In `@modelopt/torch/quantization/plugins/megatron.py`:
- Around line 810-837: Add a docstring to the newly public function
get_mcore_decoder_layers describing purpose, parameters, return type and an
example, and ensure register_megatron_autoquant_support also has appropriate
public-docstring coverage if needed; then export both symbols by adding
"register_megatron_autoquant_support" and "get_mcore_decoder_layers" to the
module's __all__ list at the top of the file so they are part of the public API
surface.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 30c2390a-c99c-4b41-8c0c-0be68734dc77

📥 Commits

Reviewing files that changed from the base of the PR and between d63bf70 and 2ba29fd.

📒 Files selected for processing (7)

modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/utils/calib_utils.py
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
tests/unit/torch/quantization/test_autoquant.py

coderabbitai · 2026-05-28T21:38:18Z

+    if hasattr(model, "output_layer") and layers:
+        layers.append(model.output_layer)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n -C3 'def get_mcore_decoder_layers|layers\.append$model\.output_layer$' modelopt/torch/quantization/plugins/megatron.py rg -n -C4 'class LayerActivationCollector|register_decoder_layer_support|get_mcore_decoder_layers' --type=py modelopt/torch/quantization

Repository: NVIDIA/Model-Optimizer

Length of output: 4838

🏁 Script executed:

#!/bin/bash set -euo pipefail file="modelopt/torch/quantization/utils/layerwise_calib.py" rg -n "def get_decoder_layers\\b|get_decoder_layers\\(" "$file" rg -n "LayerActivationCollector\\.get_decoder_layers|get_decoder_layers\\b" modelopt/torch/quantization -S --type=py # Show the implementation region in layerwise_calib.py around get_decoder_layers and where layers are used. # Use line ranges guessed from rg output by printing ~120 lines around matches. python3 - <<'PY' import re, subprocess, os, sys file="modelopt/torch/quantization/utils/layerwise_calib.py" out=subprocess.check_output(["rg","-n","def get_decoder_layers\\b",file],text=True) m=re.search(r":(\d+):",out) if not m: print("No get_decoder_layers def found") sys.exit(0) ln=int(m.group(1)) start=max(1,ln-60); end=ln+160 print(f"\n--- {file}:{start}-{end} ---") subprocess.run(["sed","-n",f"{start},{end}p",file],check=False) PY

Repository: NVIDIA/Model-Optimizer

Length of output: 959

🏁 Script executed:

#!/bin/bash set -euo pipefail file="modelopt/torch/quantization/utils/layerwise_calib.py" rg -n "def get_decoder_layers\\b|get_decoder_layers\\(" "$file" rg -n "LayerActivationCollector\\.get_decoder_layers|get_decoder_layers\\b" modelopt/torch/quantization -S --type=py # Show implementation region around get_decoder_layers and where layers are used. python3 - <<'PY' import re, subprocess, sys file="modelopt/torch/quantization/utils/layerwise_calib.py" out=subprocess.check_output(["rg","-n","def get_decoder_layers\\b",file],text=True) m=re.search(r":(\d+):",out) if not m: print("No get_decoder_layers def found") sys.exit(0) ln=int(m.group(1)) start=max(1,ln-60); end=ln+200 print(f"\n--- {file}:{start}-{end} ---") subprocess.run(["sed","-n",f"{start},{end}p",file],check=False) PY

Repository: NVIDIA/Model-Optimizer

Length of output: 959

🏁 Script executed:

#!/bin/bash set -euo pipefail file="modelopt/torch/quantization/utils/layerwise_calib.py" # Print the get_decoder_layers implementation and relevant callers/caching. python3 - <<'PY' import subprocess, re file="modelopt/torch/quantization/utils/layerwise_calib.py" rg_out=subprocess.check_output(["rg","-n","def get_decoder_layers\\b",file],text=True) print(rg_out) m=re.search(r":(\d+):", rg_out) ln=int(m.group(1)) start=max(1, ln-40); end=ln+120 print(f"\n--- {file}:{start}-{end} ---") subprocess.run(["sed","-n",f"{start},{end}p",file],check=False) PY # Also inspect where _decoder_layers is set and how often. rg -n "_decoder_layers\\s*=\\s*self\\.get_decoder_layers|self\\._decoder_layers" -n modelopt/torch/quantization/utils/layerwise_calib.py python3 - <<'PY' import subprocess, re file="modelopt/torch/quantization/utils/layerwise_calib.py" out=subprocess.check_output(["rg","-n","_decoder_layers\\s*=\\s*self\\.get_decoder_layers|self\\._decoder_layers",file],text=True) print(out) PY # Show the LayerActivationCollector class header and nearby doc/comments. python3 - <<'PY' import subprocess file="modelopt/torch/quantization/utils/layerwise_calib.py" out=subprocess.check_output(["rg","-n","class LayerActivationCollector\\b",file],text=True) print(out) ln=int(out.strip().split(":")[0]) start=max(1,ln-30); end=ln+60 print(f"\n--- {file}:{start}-{end} ---") subprocess.run(["sed","-n",f"{start},{end}p",file],check=False) PY # Inspect any caching on the class method get_decoder_layers (class-level) rg -n "cache|get_decoder_layers\\.|_decoder" modelopt/torch/quantization/utils/layerwise_calib.py

Repository: NVIDIA/Model-Optimizer

Length of output: 291

🏁 Script executed:

#!/bin/bash set -euo pipefail file="modelopt/torch/quantization/utils/layerwise_calib.py" # Show class and the full get_decoder_layers method body + immediate usages. sed -n '90,320p' "$file" | nl -ba | sed -n '1,260p' # Show how discoverer registry is used inside get_decoder_layers. rg -n "get_decoder_layers\\b|_decoder_layer_support|register_decoder_layer_support" "$file" --context=2 # Show where get_decoder_layers is called and if it’s cached. rg -n "_decoder_layers\\s*=\\s*self\\.get_decoder_layers|_decoder_layers\\b" "$file" --context=2 # Show calls from model_calib.py. rg -n "LayerActivationCollector\\.get_decoder_layers" -S modelopt/torch/quantization/model_calib.py modelopt/torch/quantization -g'*.py' --context=2

Repository: NVIDIA/Model-Optimizer

Length of output: 109

🏁 Script executed:

#!/bin/bash set -euo pipefail file="modelopt/torch/quantization/utils/layerwise_calib.py" echo "=== LayerActivationCollector.get_decoder_layers section ===" sed -n '120,320p' "$file" | awk '{printf "%4d:%s\n", NR+119, $0}' echo echo "=== Registry usage inside get_decoder_layers ===" rg -n "register_decoder_layer_support|_decoder_layer_support|get_decoder_layers\\b" "$file" --context=2 echo echo "=== _decoder_layers caching/assignment ===" rg -n "_decoder_layers\\s*=\\s*self\\.get_decoder_layers|_decoder_layers\\b" "$file" --context=2 echo echo "=== Calls into get_decoder_layers from other files ===" rg -n "LayerActivationCollector\\.get_decoder_layers" -S modelopt/torch/quantization -g'*.py' --context=2

Repository: NVIDIA/Model-Optimizer

Length of output: 17318

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect model_calib usage around transformer_layers = LayerActivationCollector.get_decoder_layers(model) sed -n '1720,1860p' modelopt/torch/quantization/model_calib.py | awk '{print NR+1720-1 ":" $0}' # Find where LayerActivationCollector is instantiated in model_calib / elsewhere rg -n "LayerActivationCollector\$" modelopt/torch/quantization --type=py rg -n "new LayerActivationCollector|_patch_all_layers|_decoder_layers" modelopt/torch/quantization/model_calib.py --type=py # Count how many times get_decoder_layers is called for a given flow (static search) rg -n "LayerActivationCollector\\.get_decoder_layers\\(" modelopt/torch/quantization --type=py rg -n "is_supported\\(model\$|LayerActivationCollector\\.is_supported\\(" modelopt/torch/quantization --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 6793

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "LayerActivationCollector\\.get_decoder_layers\\(" -S . --type=py rg -n "LayerActivationCollector\\.is_supported\\(" -S . --type=py rg -n "get_mcore_decoder_layers\\(" -S modelopt/torch/quantization --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 1600

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect tests that cover LayerActivationCollector.get_decoder_layers behavior. sed -n '1,260p' tests/unit/torch/quantization/test_utils.py | awk '{print NR ":" $0}' # Inspect megatron.py around get_mcore_decoder_layers and any __all__/new API functions. sed -n '760,900p' modelopt/torch/quantization/plugins/megatron.py | awk '{print NR ":" $0}'

Repository: NVIDIA/Model-Optimizer

Length of output: 13437

Avoid in-place mutation in get_mcore_decoder_layers (prevents permanent side effects/duplicate output_layer).

In modelopt/torch/quantization/plugins/megatron.py:826-832, get_mcore_decoder_layers() returns model.decoder.layers and unconditionally append(model.output_layer), permanently mutating the model. layerwise_calibrate() passes these layers directly into LayerActivationCollector._patch_all_layers, and _cleanup_layers() does not undo the appended module, so repeated calls to LayerActivationCollector.get_decoder_layers(model) will duplicate output_layer and lead to duplicated calibration/forward patching. Construct a fresh nn.ModuleList (copy) when augmenting with output_layer, or make the augmentation idempotent.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/quantization/plugins/megatron.py` around lines 830 - 831, get_mcore_decoder_layers is mutating model.decoder.layers by appending model.output_layer which causes duplicated entries on repeated calls; instead return a new nn.ModuleList (e.g., copy model.decoder.layers into a fresh list/ModuleList) and append the output_layer to that new collection or check for existence before appending so augmentation is idempotent; update get_mcore_decoder_layers (and calls from LayerActivationCollector.get_decoder_layers / LayerActivationCollector._patch_all_layers) to use the non-mutating copy so _cleanup_layers need not undo permanent changes.

codecov · 2026-05-28T21:43:45Z

Codecov Report

❌ Patch coverage is 42.22222% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.32%. Comparing base (d63bf70) to head (2ba29fd).

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/plugins/megatron.py	4.16%	23 Missing ⚠️
modelopt/torch/quantization/model_quant.py	66.66%	1 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py	83.33%	1 Missing ⚠️
modelopt/torch/quantization/utils/calib_utils.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@                          Coverage Diff                          @@
##           feature/mcore_mse_mixed_precision    #1562      +/-   ##
=====================================================================
- Coverage                              69.35%   69.32%   -0.03%     
=====================================================================
  Files                                    478      478              
  Lines                                  52203    52242      +39     
=====================================================================
+ Hits                                   36203    36218      +15     
- Misses                                 16000    16024      +24

Flag	Coverage Δ
examples	`33.59% <37.77%> (+<0.01%)`	⬆️
gpu	`50.84% <20.00%> (-0.03%)`	⬇️
regression	`15.22% <15.55%> (+<0.01%)`	⬆️
unit	`52.74% <40.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

autoquant and gptq in mcore

2ba29fd

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested a review from a team as a code owner May 28, 2026 21:30

jenchen13 requested review from ajrasane, realAsma and sugunav14 and removed request for a team May 28, 2026 21:30

jenchen13 changed the title ~~Autoquant and GPTQ in support in Megatron-Core~~ Autoquant and GPTQ in support in Megatron-Core [OMNIML-3151] May 28, 2026

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoquant and GPTQ in support in Megatron-Core [OMNIML-3151]#1562

Autoquant and GPTQ in support in Megatron-Core [OMNIML-3151]#1562
jenchen13 wants to merge 1 commit into
feature/mcore_mse_mixed_precisionfrom
jennifchen/mcore_autoquant_gptq

jenchen13 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch `gh-pages` at 2026-05-28 21:34 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if hasattr(model, "output_layer") and layers:
		layers.append(model.output_layer)

Conversation

jenchen13 commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 28, 2026

Built to branch gh-pages at 2026-05-28 21:34 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jenchen13 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-28 21:34 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 28, 2026 •

edited

Loading