Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs) by shengliangxu · Pull Request #1525 · NVIDIA/Model-Optimizer

shengliangxu · 2026-05-21T19:03:48Z

What does this PR do?

Type of change: Refactor

Replace the hardcoded QUANT_CFG_CHOICES / KV_QUANT_CFG_CHOICES dicts in examples/llm_ptq/hf_ptq.py with a lazy Mapping that discovers available qformat names by listing modelopt_recipes/configs/ptq/presets/{model,kv}/ and loads each YAML on first access via the existing load_config(..., schema_type=QuantizeConfig) path. The directory listing becomes the source of truth for --qformat / --kv_cache_qformat CLI vocabulary.

A small _QFORMAT_ALIASES table preserves previously-supported short CLI names (int8_sq, nvfp4_awq, fp8_pb_wo, ...) as deprecation shims. It is documented as not-for-extension — new formats land as preset YAMLs, and longer term, configurations should be authored as full recipes (--recipe).

Also adds presets/kv/fp8_cast.yaml and presets/kv/nvfp4_cast.yaml, composed from the existing kv_fp8_cast / kv_nvfp4_cast unit fragments. This promotes fp8_cast / nvfp4_cast to first-class KV presets and lets us delete the runtime _set_kv_cache_constant_amax helper and all three of its call sites — use_constant_amax is now authoritative in the YAML.

Side effect: every preset YAML under presets/model/ (mxfp4, mxfp6, mxint8, nvfp4_awq_full, nvfp4_fp8_mha, mamba_moe_*, ...) is now automatically exposed as a valid --qformat value with no further code change.

Usage

# Old short names still work via the alias shim
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat int8_sq \
    --kv_cache_qformat fp8_cast \
    --export_path out/

# New canonical preset basenames work directly
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat int8_smoothquant \
    --kv_cache_qformat fp8_cast \
    --export_path out/

# Newly-exposed presets (previously not on the CLI)
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat nvfp4_awq_full \
    --export_path out/

Testing

Verified locally with both .venv (uv, py3.13) and the dev-py310-modelopt conda env:

All 20 previously-supported --qformat short names resolve and produce dicts that are exactly equal to the corresponding mtq.X_DEFAULT_CFG constants.
All 7 KV qformat names (fp8, fp8_cast, fp8_affine, nvfp4, nvfp4_cast, nvfp4_affine, nvfp4_rotate) resolve and match.
fp8_cast / nvfp4_cast YAML presets now contain use_constant_amax: true baked into the [kv]_bmm_quantizer cfg.
Non-cast variants (fp8, nvfp4) still do not set use_constant_amax (data-driven calibration preserved).
argparse accepts --kv_cache_qformat none plus all cast / affine / rotate variants.
Unknown qformats raise KeyError at lookup time and argparse choice error at the CLI.
All pre-commit hooks pass (ruff, mypy, bandit, license, yaml format, recipe validation).

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ — all previously-valid --qformat and --kv_cache_qformat values continue to work via the alias table; output configs are bit-equivalent to the prior hardcoded path.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new deps.
Did you write any new necessary tests?: ❌ — existing PTQ integration tests exercise these qformats; the refactor is config-equivalence-preserving and was spot-verified against mtq.X_DEFAULT_CFG constants.
Did you update Changelog?: ❌ — happy to add an entry; treating this as an internal refactor for now.
Did you get Claude approval on this PR?: ❌ — will run /claude review once ready.

Additional Information

Two new YAML presets: modelopt_recipes/configs/ptq/presets/kv/{fp8_cast,nvfp4_cast}.yaml.
Deletes: _set_kv_cache_constant_amax helper + all 3 call sites in hf_ptq.py.
multinode_ptq.py is intentionally untouched (out of scope for this branch).

Summary by CodeRabbit

New Features
- Added FP8-cast and NVFP4-cast KV-cache quantization presets to expand supported quantization options.
Refactor
- qformat and KV-cache qformat choices are now discovered from preset files, with backward-compatible aliases preserved.
- KV-cache calibration/enabling behavior is driven by presets rather than a runtime override; CLI validation and help updated to reflect preset-driven choices.
Documentation
- Changelog updated to describe preset-driven options and new KV presets.

copy-pr-bot · 2026-05-21T19:03:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-21T19:03:55Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR refactors quantization configuration discovery in a PTQ example script from static mtq.*_CFG dictionaries to lazy-loaded YAML presets. It introduces a memoized preset loader, applies it across all quantization pipelines, removes post-hoc configuration overrides, and adds new KV-cache cast preset files.

Changes

PTQ quantization preset refactoring

Layer / File(s)	Summary
Preset infrastructure foundation `examples/llm_ptq/hf_ptq.py` (`line 21`, `59`, `70–75`, `95–229`)	Imports support for lazy mappings and recipe utilities. Defines `BUILTIN_CONFIG_ROOT`, `_PresetCfgChoices` lazy mapping class, preset directory constants, backward-compatible qformat aliases, `_KV_NONE` sentinel, and `_AUTO_QUANTIZE_QFORMATS` validation set. Replaces static `QUANT_CFG_CHOICES` and `KV_QUANT_CFG_CHOICES` dicts with discovered preset mappings.
Using presets across quantization pipelines `examples/llm_ptq/hf_ptq.py` (`lines 406–408`, `484–491`, `517–524`, `1174`, `1186–1193`, `1206`)	Auto-quantize, low-memory, and mono quantization paths now retrieve KV-cache configs from `KV_QUANT_CFG_CHOICES` preset mappings instead of `mtq` module lookups. KV-cache enabling switches from string `"none"` to `_KV_NONE` sentinel. Removes `_set_kv_cache_constant_amax` helper and post-hoc override logic that forced `use_constant_amax` for cast formats. Updates error messages and validation to reflect the new preset mapping type.
CLI argument updates `examples/llm_ptq/hf_ptq.py` (`line 1356`)	`--kv_cache_qformat` argument `choices` now includes `_KV_NONE` and dynamic keys from `KV_QUANT_CFG_CHOICES` preset mapping instead of static dict keys.
New KV-cache cast presets & changelog `modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml`, `modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml`, `CHANGELOG.rst`	Adds FP8 E4M3 and NVFP4 KV-cache cast preset YAML files (each with `imports` and `$import` usage in `quant_cfg`) and updates the changelog describing preset-driven CLI discovery and removal of runtime `use_constant_amax` patching.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers:

Edwardf0t1
realAsma
meenchen

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: replacing hardcoded CLI quantization config choices with YAML preset discovery.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No SECURITY.md violations found: safe yaml.safe_load_all in config_loader, trust_remote_code is a CLI arg with default=False, no eval/exec/nosec, no new unsafe dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/hf-ptq-dereference-hardcoded-configs

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-21T19:18:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.32%. Comparing base (9d0d978) to head (6204de4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1525      +/-   ##
==========================================
+ Coverage   76.82%   77.32%   +0.50%     
==========================================
  Files         477      477              
  Lines       51957    51957              
==========================================
+ Hits        39916    40178     +262     
+ Misses      12041    11779     -262

Flag	Coverage Δ
examples	`41.69% <ø> (+0.96%)`	⬆️
unit	`52.76% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/llm_ptq/hf_ptq.py (1)

130-133: ⚡ Quick win

Derive KV calibration skip from config semantics, not hardcoded format names.

Line 476 hardcodes cast-format names via _KV_CAST_FORMATS. Since presets are YAML-driven now, this risks drift when presets evolve. Prefer checking whether the selected KV config actually needs calibration.

Suggested refactor

-        if args.kv_cache_qformat not in _KV_CAST_FORMATS:
+        if need_calibration({"quant_cfg": kv_cache_quant_cfg, "algorithm": "max"}):
             # Calibrate only the KV cache quantizers; disable all others.
             with mtq.set_quantizer_by_cfg_context(
                 language_model,
                 [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg],
             ):
                 mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)

Also applies to: 476-483

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 130 - 133, Replace the hardcoded
_KV_CAST_FORMATS check with a semantic check on the chosen KV preset: instead of
testing the format name via _KV_CAST_FORMATS, inspect the selected KV
configuration object (the loaded preset used for KV, e.g., the variable that
selects the KV preset in this module—refer to the code that chooses the "kv"
preset) and decide to skip calibration when that config explicitly pins
use_constant_amax (or an equivalent flag like
requires_calibration/use_constant_amax) — remove the frozenset usage and branch
on the KV config's semantic field so YAML-driven presets control the behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 194-211: The CLI validation currently tests raw tokens against
_AUTO_QUANTIZE_QFORMATS, which rejects valid canonical names because
canonical/alias resolution happens later via QUANT_CFG_CHOICES; change the
validation to check against the full set of accepted keys (e.g., use
QUANT_CFG_CHOICES.keys() or build a normalized set of canonical names/aliases)
or resolve each token through the same lookup used later before rejecting.
Update the checks that reference _AUTO_QUANTIZE_QFORMATS (and any logic around
parsing auto-quantize tokens) to use QUANT_CFG_CHOICES (or a derived normalized
set) so canonical names and aliases are accepted consistently.

---

Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 130-133: Replace the hardcoded _KV_CAST_FORMATS check with a
semantic check on the chosen KV preset: instead of testing the format name via
_KV_CAST_FORMATS, inspect the selected KV configuration object (the loaded
preset used for KV, e.g., the variable that selects the KV preset in this
module—refer to the code that chooses the "kv" preset) and decide to skip
calibration when that config explicitly pins use_constant_amax (or an equivalent
flag like requires_calibration/use_constant_amax) — remove the frozenset usage
and branch on the KV config's semantic field so YAML-driven presets control the
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61cb3534-3f30-4331-b8b7-3a3cf32cca68

📥 Commits

Reviewing files that changed from the base of the PR and between c9098b6 and aae0fe1.

📒 Files selected for processing (3)

examples/llm_ptq/hf_ptq.py
modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml
modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml

coderabbitai

♻️ Duplicate comments (1)

examples/llm_ptq/hf_ptq.py (1)

403-407: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize auto-quantize qformats before checking _AUTO_QUANTIZE_QFORMATS.

Line 406 validates raw CLI tokens, but Line 465 resolves them through QUANT_CFG_CHOICES, which now accepts both canonical preset basenames and legacy aliases. A canonical preset like int8_smoothquant is therefore accepted later but rejected here first.

💡 Suggested fix

-    qformat_list = args.qformat.split(",")
+    qformat_list = [q.strip() for q in args.qformat.split(",")]
     assert qformat_list, "No quantization formats provided"
-    # Check if all provided quantization formats are supported
-    assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in qformat_list), (
+    canonical_qformats = [
+        QUANT_CFG_CHOICES._canonical(qformat) if isinstance(QUANT_CFG_CHOICES, _PresetCfgChoices) else qformat
+        for qformat in qformat_list
+    ]
+    assert all(qformat is not None for qformat in canonical_qformats), (
+        "Unsupported quantization format provided"
+    )
+    assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in canonical_qformats), (
         "One or more quantization formats provided are not supported for unified checkpoint export"
     )

Also applies to: 465-465

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 403 - 407, The assertion is checking
raw CLI tokens in qformat_list against _AUTO_QUANTIZE_QFORMATS before they are
normalized later; update the validation to normalize each args.qformat token
using the same resolution used at Line 465 (QUANT_CFG_CHOICES/its alias mapping)
and then check the normalized canonical names against _AUTO_QUANTIZE_QFORMATS.
Concretely, transform qformat_list by mapping each entry through the
QUANT_CFG_CHOICES lookup (or its alias→canonical resolver) to produce
canonical_qformats, then assert canonical_qformats is non-empty and that all
entries are in _AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat,
QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).

🧹 Nitpick comments (1)

examples/llm_ptq/hf_ptq.py (1)

487-499: ⚡ Quick win

Drive KV-cache calibration skipping from the preset config, not the preset name.

Line 493 reintroduces a hardcoded name check after this refactor made YAML authoritative. If a future KV preset sets use_constant_amax, it will be CLI-exposed here and then immediately recalibrated anyway.

💡 Suggested fix

         kv_cache_quant_cfg = copy.deepcopy(KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"])
         kv_cache_quant_cfg = [
             e for e in kv_cache_quant_cfg if e["quantizer_name"] != "*"
         ]  # keep other quantizers from auto_quantize

         mtq.set_quantizer_by_cfg(language_model, quant_cfg=kv_cache_quant_cfg)
-        if args.kv_cache_qformat not in _KV_CAST_FORMATS:
+        needs_kv_calibration = any(
+            not entry.get("use_constant_amax", False) for entry in kv_cache_quant_cfg
+        )
+        if needs_kv_calibration:
             # Calibrate only the KV cache quantizers; disable all others.
             with mtq.set_quantizer_by_cfg_context(
                 language_model,
                 [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg],
             ):
                 mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 487 - 499, The code currently
decides whether to skip KV-cache calibration by checking the preset name
(args.kv_cache_qformat not in _KV_CAST_FORMATS); instead, inspect the actual
preset config (kv_cache_quant_cfg) and skip calibration when the preset
indicates constant amax behavior. Replace the name-based condition with a
config-based check (e.g., if not any(entry.get("use_constant_amax") for entry in
kv_cache_quant_cfg): ... ) so mtq.calibrate(...) runs only when none of the KV
quantizer entries specify use_constant_amax; reference kv_cache_quant_cfg,
KV_QUANT_CFG_CHOICES, args.kv_cache_qformat, and mtq.calibrate in your change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 403-407: The assertion is checking raw CLI tokens in qformat_list
against _AUTO_QUANTIZE_QFORMATS before they are normalized later; update the
validation to normalize each args.qformat token using the same resolution used
at Line 465 (QUANT_CFG_CHOICES/its alias mapping) and then check the normalized
canonical names against _AUTO_QUANTIZE_QFORMATS. Concretely, transform
qformat_list by mapping each entry through the QUANT_CFG_CHOICES lookup (or its
alias→canonical resolver) to produce canonical_qformats, then assert
canonical_qformats is non-empty and that all entries are in
_AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat,
QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).

---

Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 487-499: The code currently decides whether to skip KV-cache
calibration by checking the preset name (args.kv_cache_qformat not in
_KV_CAST_FORMATS); instead, inspect the actual preset config
(kv_cache_quant_cfg) and skip calibration when the preset indicates constant
amax behavior. Replace the name-based condition with a config-based check (e.g.,
if not any(entry.get("use_constant_amax") for entry in kv_cache_quant_cfg): ...
) so mtq.calibrate(...) runs only when none of the KV quantizer entries specify
use_constant_amax; reference kv_cache_quant_cfg, KV_QUANT_CFG_CHOICES,
args.kv_cache_qformat, and mtq.calibrate in your change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a74f2b60-827e-4805-a9ff-ac9644c33ec5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff7c9e and 76fc552.

📒 Files selected for processing (1)

examples/llm_ptq/hf_ptq.py

Replace the hardcoded QUANT_CFG_CHOICES / KV_QUANT_CFG_CHOICES dicts in examples/llm_ptq/hf_ptq.py with a lazy Mapping that discovers available qformat names by listing modelopt_recipes/configs/ptq/presets/{model,kv}/ and loads each YAML on first access via the existing load_config(..., schema_type=QuantizeConfig) path. A small _QFORMAT_ALIASES table keeps the previously-supported short CLI names (int8_sq, nvfp4_awq, fp8_pb_wo, ...) working as deprecation shims; the table is documented as not-for-extension since new formats should land as preset YAMLs (or, longer term, as full recipes). Also add presets/kv/fp8_cast.yaml and presets/kv/nvfp4_cast.yaml so fp8_cast / nvfp4_cast become first-class KV presets composed from the existing kv_fp8_cast / kv_nvfp4_cast unit fragments. This drops the KV alias entries and lets us delete the runtime _set_kv_cache_constant_amax helper and all three of its call sites; use_constant_amax is now authoritative in the YAML. Side effect: every preset YAML under presets/model/ (mxfp4, mxfp6, mxint8, nvfp4_awq_full, nvfp4_fp8_mha, mamba_moe_*, ...) is now automatically exposed as a valid --qformat value with no further code change. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

- Deepcopy in _PresetCfgChoices.__getitem__ so callers can freely mutate the returned quant_cfg without poisoning the cache. - Assert that _KV_NONE does not collide with any discovered KV preset. - Expand the comment on _AUTO_QUANTIZE_QFORMATS explaining why it stays hardcoded (auto_quantize compatibility is an export-path property, not a YAML-derivable one). - Add CHANGELOG entry for the qformat discovery refactor and the fp8_cast / nvfp4_cast preset promotion (including the note that out-of-tree recipes targeting cast KV must set use_constant_amax themselves now that the runtime override is gone). Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Codify the policy that the preset directory listing IS the CLI vocabulary — there is intentionally no separate allow-list. New presets are CLI-visible the moment they land in the directory; this is a feature, not an oversight. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

1. Auto-quantize validation: _AUTO_QUANTIZE_QFORMATS previously only listed the short alias names, so passing canonical preset basenames (e.g. int8_smoothquant instead of int8_sq) would be rejected even though the underlying configs are identical. Switch the set to canonical names and canonicalize incoming tokens via a new _canonical_qformat() helper so both forms are accepted. 2. KV cast detection: replace the hardcoded _KV_CAST_FORMATS = {fp8_cast, nvfp4_cast} name set with a semantic check (_kv_cfg_uses_constant_amax) that inspects the loaded KV cfg's *[kv]_bmm_quantizer entry for use_constant_amax. This makes "should we skip KV calibration?" YAML-driven: any future cast-style KV preset works without touching this script. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/llm_ptq/hf_ptq.py (1)

210-215: ⚡ Quick win

Validate the KV default against the dynamic preset registry.

--kv_cache_qformat now derives its valid values from KV_QUANT_CFG_CHOICES, but the default is still the literal "fp8_cast". If that preset is renamed or removed, the parser will still inject an invalid default and the first lookup later will fail. Hoist the default into a constant and check it next to the _KV_NONE collision guard.

💡 Consistency check

 _KV_NONE = "none"
+_DEFAULT_KV_QFORMAT = "fp8_cast"
 ...
 assert _KV_NONE not in KV_QUANT_CFG_CHOICES, (
     f"_KV_NONE sentinel {_KV_NONE!r} collides with a KV preset; rename the preset."
 )
+assert _DEFAULT_KV_QFORMAT in KV_QUANT_CFG_CHOICES, (
+    f"Default KV preset {_DEFAULT_KV_QFORMAT!r} is not present; update the default."
+)
 ...
     parser.add_argument(
         "--kv_cache_qformat",
         required=False,
-        default="fp8_cast",
+        default=_DEFAULT_KV_QFORMAT,
         choices=[_KV_NONE, *KV_QUANT_CFG_CHOICES],

Also applies to: 1365-1368

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 210 - 215, The default value for
--kv_cache_qformat is hardcoded as "fp8_cast" and may become invalid if the
dynamic preset registry KV_QUANT_CFG_CHOICES changes; hoist that literal into a
named constant (e.g., DEFAULT_KV_QFORMAT = "fp8_cast") and add an assertion that
DEFAULT_KV_QFORMAT is present in KV_QUANT_CFG_CHOICES immediately alongside the
existing _KV_NONE collision guard (which checks _KV_NONE not in
KV_QUANT_CFG_CHOICES). Repeat the same hoist-and-assert change for the other
preset-usage site referenced in the file (the earlier block that currently uses
the literal "fp8_cast") so both parser defaults are validated against
KV_QUANT_CFG_CHOICES.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 544-552: When running in low-memory mode the code currently seeds
init_quantized_weights() from args.qformat/args.kv_cache_qformat
(QUANT_CFG_CHOICES, _KV_NONE, mtq.utils.update_quant_cfg_with_kv_cache_quant),
which conflicts with later quantize_main() when a --recipe is provided; add an
explicit reject: if args.low_memory_mode and args.recipe is set, raise/exit with
a clear message that recipes are not supported in low-memory mode (or
alternatively ensure this branch reads/uses the recipe quant config instead), so
the pre-instrumented quantizer layout cannot diverge from the recipe-driven
layout.

---

Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 210-215: The default value for --kv_cache_qformat is hardcoded as
"fp8_cast" and may become invalid if the dynamic preset registry
KV_QUANT_CFG_CHOICES changes; hoist that literal into a named constant (e.g.,
DEFAULT_KV_QFORMAT = "fp8_cast") and add an assertion that DEFAULT_KV_QFORMAT is
present in KV_QUANT_CFG_CHOICES immediately alongside the existing _KV_NONE
collision guard (which checks _KV_NONE not in KV_QUANT_CFG_CHOICES). Repeat the
same hoist-and-assert change for the other preset-usage site referenced in the
file (the earlier block that currently uses the literal "fp8_cast") so both
parser defaults are validated against KV_QUANT_CFG_CHOICES.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4095c6fc-e78b-4a31-a837-a01c308d9261

📥 Commits

Reviewing files that changed from the base of the PR and between 29f03fa and 6204de4.

📒 Files selected for processing (4)

CHANGELOG.rst
examples/llm_ptq/hf_ptq.py
modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml
modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml

✅ Files skipped from review due to trivial changes (2)

CHANGELOG.rst
modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml

coderabbitai · 2026-05-27T00:09:01Z

        assert args.qformat in QUANT_CFG_CHOICES, (
-            f"Quantization format is not supported for low memory mode. Supported formats: {QUANT_CFG_CHOICES.keys()}"
+            f"Quantization format is not supported for low memory mode. Supported formats: {list(QUANT_CFG_CHOICES)}"
        )
        quant_cfg = QUANT_CFG_CHOICES[args.qformat]
-        if args.kv_cache_qformat != "none":
+        if args.kv_cache_qformat != _KV_NONE:
            quant_cfg = mtq.utils.update_quant_cfg_with_kv_cache_quant(
                quant_cfg,
-                getattr(mtq, KV_QUANT_CFG_CHOICES[args.kv_cache_qformat])["quant_cfg"],
+                KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"],
            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject --recipe in low-memory mode until this path uses the recipe config.

This branch still seeds init_quantized_weights() from args.qformat / args.kv_cache_qformat. With --low_memory_mode --recipe ..., the model is pre-instrumented with a different quantizer layout than quantize_main() later applies, so the new “recipe is authoritative” contract is broken before quantization even starts.

💡 Minimal safe fix

else: + if args.recipe is not None: + raise ValueError( + "--low_memory_mode does not yet support --recipe; the low-memory " + "loader still initializes quantizers from --qformat/--kv_cache_qformat." + ) assert args.qformat in QUANT_CFG_CHOICES, ( f"Quantization format is not supported for low memory mode. Supported formats: {list(QUANT_CFG_CHOICES)}" )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` around lines 544 - 552, When running in low-memory mode the code currently seeds init_quantized_weights() from args.qformat/args.kv_cache_qformat (QUANT_CFG_CHOICES, _KV_NONE, mtq.utils.update_quant_cfg_with_kv_cache_quant), which conflicts with later quantize_main() when a --recipe is provided; add an explicit reject: if args.low_memory_mode and args.recipe is set, raise/exit with a clear message that recipes are not supported in low-memory mode (or alternatively ensure this branch reads/uses the recipe quant config instead), so the pre-instrumented quantizer layout cannot diverge from the recipe-driven layout.

shengliangxu changed the title ~~Drive hf_ptq qformat choices from preset YAMLs~~ Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs) May 21, 2026

shengliangxu marked this pull request as ready for review May 21, 2026 19:12

shengliangxu requested review from a team as code owners May 21, 2026 19:12

shengliangxu requested review from cjluo-nv, meenchen and realAsma May 21, 2026 19:12

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot approved these changes May 21, 2026

View reviewed changes

shengliangxu added 4 commits May 26, 2026 16:59

shengliangxu force-pushed the shengliangx/hf-ptq-dereference-hardcoded-configs branch from 29f03fa to 6204de4 Compare May 27, 2026 00:00

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525

Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525
shengliangxu wants to merge 4 commits into
mainfrom
shengliangx/hf-ptq-dereference-hardcoded-configs

shengliangxu commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shengliangxu commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shengliangxu commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading