[TRTLLM-12507][feat] Add routed-expert LoRA helpers#14764
Conversation
7dd0c55 to
2b57061
Compare
📝 WalkthroughWalkthroughThis PR adds MoE LoRA shared-outer layout metadata support. It introduces ChangesMoE LoRA Shared-Outer Layout
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)
182-203:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign MoE LoRA quantization validation with per-layer mode
create_moe.pycomputeshas_quantfromeffective_quant_config.layer_quant_mode.has_any_quant(exclude_kv_cache=True)to decide whether to useCutlassFusedMoE.tensorrt_llm/_torch/peft/lora/validation.py::check_moe_lora_supporteddecides LoRA support fromquant_config.quant_mode(viaquant_mode.has_any_quant(exclude_kv_cache=True)).
Ifquant_modeandlayer_quant_modediffer under mixed/per-layer quantization, the validator can accept/reject based on a different “quantized-ness” than the backend-selection logic. Pass the per-layer quant mode (or a layer-derivedquant_mode) into the validator for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 182 - 203, The LoRA validator is using the global quant_config while the backend decision uses the per-layer quant mode; update the call to check_moe_lora_supported in create_moe.py to pass the per-layer quant mode (from effective_quant_config.layer_quant_mode) instead of the top-level quant_config so both checks use the same per-layer quantization view; ensure you handle effective_quant_config being None (pass None or a sensible default quant_mode) and keep the resolved_backend logic (resolved_backend variable and layer_idx) unchanged so diagnostics still map non-Cutlass resolutions back to the requested backend name.
🧹 Nitpick comments (5)
tensorrt_llm/_torch/peft/lora/validation.py (1)
16-24: ⚡ Quick winUse built-in
setinstead oftyping.Set.
_normalize_targetsreturnsSet[str]; preferset[str]. LikewiseOptional[...]can useX | Noneper the repo style.♻️ Proposed change
-from typing import Iterable, Optional, Set +from collections.abc import Iterable +from typing import Optional-def _normalize_targets(lora_target_modules: Iterable[str]) -> Set[str]: +def _normalize_targets(lora_target_modules: Iterable[str]) -> set[str]:As per coding guidelines: "Prefer built-in types
list,dict,tupleover legacytyping.List,typing.Dict,typing.Tuple".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/peft/lora/validation.py` around lines 16 - 24, Replace legacy typing generics with built-in generics and remove unused typing imports: change the return annotation of _normalize_targets from typing.Set[str] to set[str], update any Optional[...] occurrences to the modern X | None form, and remove unused imports like Set and Optional from the top; update the import line so only Iterable is imported from typing (or remove typing entirely if not needed) while leaving references to LoraConfig and MOE_LORA_MODULE_NAMES untouched.tensorrt_llm/_torch/peft/lora/moe_layout.py (1)
9-9: ⚡ Quick winPrefer built-in generics over
typing.Dict/typing.Tuple.This new module uses legacy
typing.Dict/typing.Tuplealiases (e.g. Lines 16, 24, 41, 63, 88 return types). Per the repo guideline, use built-indict/tuple(andint | NoneforOptional[int]).♻️ Example
-from typing import Dict, Literal, Optional, Tuple +from typing import Literal, Optional-MOE_LORA_MODULES: Tuple[str, ...] = ("moe_h_to_4h", "moe_4h_to_h", "moe_gate") +MOE_LORA_MODULES: tuple[str, ...] = ("moe_h_to_4h", "moe_4h_to_h", "moe_gate")-DEFAULT_SHARED_SIDE: Dict[str, SharedSide] = { +DEFAULT_SHARED_SIDE: dict[str, SharedSide] = {(and
-> Dict[str, torch.Tensor]→-> dict[str, torch.Tensor])As per coding guidelines: "Prefer built-in types
list,dict,tupleover legacytyping.List,typing.Dict,typing.Tuple".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/peft/lora/moe_layout.py` at line 9, The type annotations in moe_layout.py use legacy typing aliases (typing.Dict, typing.Tuple, Optional) in function/method return and parameter annotations; replace all occurrences with built-in generics (dict, tuple) and use PEP‑604 union syntax for optional types (e.g., int | None) or plain None union (e.g., torch.Tensor | None) — update each signature and any import so that typing.Dict/typing.Tuple/Optional are removed and signatures like -> Dict[str, torch.Tensor] become -> dict[str, torch.Tensor], Tuple[...] become tuple[...], and Optional[int] becomes int | None (ensure any imports from typing are cleaned up accordingly).tests/unittest/others/test_lora_manager.py (1)
188-196: ⚡ Quick winMake Optional explicit in function signature.
The parameter
layout_metadata: dict = Nonehas an implicitOptionaltype, which violates PEP 484. Use the|syntax to make it explicit.♻️ Proposed fix
def _create_dummy_moe_hf_lora_adapter( adapter_dir: Path, hidden_size: int = 64, intermediate_size: int = 128, rank: int = 8, num_layers: int = 2, num_experts: int = 4, - layout_metadata: dict = None, + layout_metadata: dict | None = None, ):As per coding guidelines, use
|syntax instead oftyping.Union, and avoid implicitOptionalby makingNonedefaults explicit in the type.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/others/test_lora_manager.py` around lines 188 - 196, The function _create_dummy_moe_hf_lora_adapter has an implicitly optional parameter layout_metadata: dict = None; update its signature to make the Optional explicit using PEP 604 syntax (e.g., layout_metadata: dict | None = None) so the type is explicit and compliant; modify only the parameter annotation in _create_dummy_moe_hf_lora_adapter (no behavioral changes) and run tests to confirm typing annotations are accepted.tensorrt_llm/lora_manager.py (1)
27-35: ⚡ Quick winPrefer Python 3.10+ built-in generic types in new imports.
The new imports use legacy
typinggenerics. Since this codebase targets Python 3.10+, prefer built-indictoverDictin type hints throughout the new code.♻️ Refactor imports
from .lora_layout_metadata import ( - all_false_flags as _moe_all_false_shared_flags, -) -from .lora_layout_metadata import ( - layout_to_kernel_flags as _moe_layout_to_kernel_flags, -) -from .lora_layout_metadata import ( - load_lora_layout_metadata as _load_lora_layout_metadata, + all_false_flags as _moe_all_false_shared_flags, + layout_to_kernel_flags as _moe_layout_to_kernel_flags, + load_lora_layout_metadata as _load_lora_layout_metadata, )Then update the new method signature at line 1260:
- def get_moe_shared_flags(self, uid: str) -> Dict[str, bool]: + def get_moe_shared_flags(self, uid: str) -> dict[str, bool]:And line 751:
- self._uid_to_moe_shared_flags: Dict[str, Dict[str, bool]] = {} + self._uid_to_moe_shared_flags: dict[str, dict[str, bool]] = {}As per coding guidelines, prefer built-in types over
typing.Dict.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/lora_manager.py` around lines 27 - 35, Replace legacy typing generics with built-in generics: update the imports and all type hints that use typing.Dict to use built-in dict for the new symbols (_moe_all_false_shared_flags, _moe_layout_to_kernel_flags, _load_lora_layout_metadata). Also update the function signatures that currently use Dict in the new code paths (the two updated method signatures referenced in the review) to use dict instead of typing.Dict so type hints use Python 3.10+ built-ins.tensorrt_llm/lora_layout_metadata.py (1)
30-32: ⚡ Quick winPrefer Python 3.10+ built-in generic types.
The codebase targets Python 3.10+, so
dict,list, etc. can be used directly in type hints without importing fromtyping. ReplaceDict,Optional,Callable, andIterablewith their built-in equivalents orcollections.abcimports where appropriate.♻️ Refactor to use built-in types
-from typing import Callable, Dict, Iterable, Optional +from collections.abc import Callable, Iterable # Shared-side value for a single module: "A", "B", or None for per-expert. -SharedSide = Optional[str] +SharedSide = str | NoneThen update all function signatures:
Dict[str, str]→dict[str, str]Dict[str, bool]→dict[str, bool]Optional[Dict[str, SharedSide]]→dict[str, SharedSide] | NoneCallable[[str], Dict[str, bool]]→Callable[[str], dict[str, bool]]As per coding guidelines, prefer built-in types
list,dict,tupleover legacytyping.List,typing.Dict,typing.Tuple; use|syntax instead oftyping.Union.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/lora_layout_metadata.py` around lines 30 - 32, The file imports legacy typing aliases (Dict, Optional, Callable, Iterable); update to Python 3.10+ built-ins and collections.abc where appropriate: remove Dict/Optional/Callable/Iterable from the typing import and replace uses like Dict[str, str] -> dict[str, str], Dict[str, bool] -> dict[str, bool], Optional[Dict[str, SharedSide]] -> dict[str, SharedSide] | None, and Callable[[str], Dict[str, bool]] -> Callable[[str], dict[str, bool]] (import Callable/Iterable from collections.abc if you need runtime ABCs); update all function signatures and annotations in this module to use these built-in generics and the | union syntax.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Line 23: Import ordering is incorrect: move the new import
"check_moe_lora_supported" into the group of other parent-package imports (the
other "from ...peft.*" / "from ...*" imports at the top of create_moe.py) so it
follows the project's isort grouping rules; locate the line "from
...peft.lora.validation import check_moe_lora_supported" and place it adjacent
to the existing parent-level imports (near the other "...peft..." imports) to
restore correct import group ordering.
In `@tensorrt_llm/lora_layout_metadata.py`:
- Around line 154-183: The docstring for layout_to_kernel_flags violates D205
because the summary and description are not separated by a single blank line;
update the docstring so the first line is a concise one-line summary (e.g.,
"Translate per-module shared-side map into kernel flags.") followed by a single
blank line and then the detailed description/args/returns block, preserving
existing content and triple-quote style for layout_to_kernel_flags.
In `@tensorrt_llm/lora_manager.py`:
- Around line 1260-1277: The docstring for get_moe_shared_flags violates D205
since there's no blank line between the one-line summary and the rest of the
description; update the get_moe_shared_flags method's docstring to insert a
single blank line after the summary line (so the short summary line is followed
by an empty line before the longer description and parameter/return details),
ensuring the opening and closing triple quotes remain correctly positioned and
the rest of the text (including the list of keys and reference to
lora_layout_metadata.py) is preserved.
In `@tests/unittest/_torch/lora/test_lora_layout_metadata.py`:
- Line 234: The file tests/unittest/_torch/lora/test_lora_layout_metadata.py is
missing a trailing newline at EOF; ensure the file ends with exactly one blank
line by adding a single newline character after the last line (so the file
terminates with a final newline).
---
Outside diff comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 182-203: The LoRA validator is using the global quant_config while
the backend decision uses the per-layer quant mode; update the call to
check_moe_lora_supported in create_moe.py to pass the per-layer quant mode (from
effective_quant_config.layer_quant_mode) instead of the top-level quant_config
so both checks use the same per-layer quantization view; ensure you handle
effective_quant_config being None (pass None or a sensible default quant_mode)
and keep the resolved_backend logic (resolved_backend variable and layer_idx)
unchanged so diagnostics still map non-Cutlass resolutions back to the requested
backend name.
---
Nitpick comments:
In `@tensorrt_llm/_torch/peft/lora/moe_layout.py`:
- Line 9: The type annotations in moe_layout.py use legacy typing aliases
(typing.Dict, typing.Tuple, Optional) in function/method return and parameter
annotations; replace all occurrences with built-in generics (dict, tuple) and
use PEP‑604 union syntax for optional types (e.g., int | None) or plain None
union (e.g., torch.Tensor | None) — update each signature and any import so that
typing.Dict/typing.Tuple/Optional are removed and signatures like -> Dict[str,
torch.Tensor] become -> dict[str, torch.Tensor], Tuple[...] become tuple[...],
and Optional[int] becomes int | None (ensure any imports from typing are cleaned
up accordingly).
In `@tensorrt_llm/_torch/peft/lora/validation.py`:
- Around line 16-24: Replace legacy typing generics with built-in generics and
remove unused typing imports: change the return annotation of _normalize_targets
from typing.Set[str] to set[str], update any Optional[...] occurrences to the
modern X | None form, and remove unused imports like Set and Optional from the
top; update the import line so only Iterable is imported from typing (or remove
typing entirely if not needed) while leaving references to LoraConfig and
MOE_LORA_MODULE_NAMES untouched.
In `@tensorrt_llm/lora_layout_metadata.py`:
- Around line 30-32: The file imports legacy typing aliases (Dict, Optional,
Callable, Iterable); update to Python 3.10+ built-ins and collections.abc where
appropriate: remove Dict/Optional/Callable/Iterable from the typing import and
replace uses like Dict[str, str] -> dict[str, str], Dict[str, bool] -> dict[str,
bool], Optional[Dict[str, SharedSide]] -> dict[str, SharedSide] | None, and
Callable[[str], Dict[str, bool]] -> Callable[[str], dict[str, bool]] (import
Callable/Iterable from collections.abc if you need runtime ABCs); update all
function signatures and annotations in this module to use these built-in
generics and the | union syntax.
In `@tensorrt_llm/lora_manager.py`:
- Around line 27-35: Replace legacy typing generics with built-in generics:
update the imports and all type hints that use typing.Dict to use built-in dict
for the new symbols (_moe_all_false_shared_flags, _moe_layout_to_kernel_flags,
_load_lora_layout_metadata). Also update the function signatures that currently
use Dict in the new code paths (the two updated method signatures referenced in
the review) to use dict instead of typing.Dict so type hints use Python 3.10+
built-ins.
In `@tests/unittest/others/test_lora_manager.py`:
- Around line 188-196: The function _create_dummy_moe_hf_lora_adapter has an
implicitly optional parameter layout_metadata: dict = None; update its signature
to make the Optional explicit using PEP 604 syntax (e.g., layout_metadata: dict
| None = None) so the type is explicit and compliant; modify only the parameter
annotation in _create_dummy_moe_hf_lora_adapter (no behavioral changes) and run
tests to confirm typing annotations are accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f59714d8-4b76-4620-a880-dced106c4d2c
📒 Files selected for processing (10)
tensorrt_llm/_torch/modules/fused_moe/create_moe.pytensorrt_llm/_torch/peft/lora/moe_layout.pytensorrt_llm/_torch/peft/lora/validation.pytensorrt_llm/_torch/pyexecutor/model_engine.pytensorrt_llm/lora_layout_metadata.pytensorrt_llm/lora_manager.pytests/unittest/_torch/lora/test_lora_layout_metadata.pytests/unittest/_torch/lora/test_moe_layout.pytests/unittest/_torch/lora/test_moe_lora_validator.pytests/unittest/others/test_lora_manager.py
e0a26b8 to
7b9299a
Compare
|
/bot run --disable-fail-fast |
7b9299a to
89dfc5e
Compare
|
PR_Github #51137 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #51141 [ run ] triggered by Bot. Commit: |
|
PR_Github #51137 [ run ] completed with state |
|
PR_Github #51141 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #51213 [ run ] triggered by Bot. Commit: |
d450609 to
222bb44
Compare
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
45d03e9 to
383ed54
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #51213 [ run ] completed with state
|
|
PR_Github #51221 [ run ] triggered by Bot. Commit: |
Description
This MR is in preparation for #14763 where the goal is to support routed-expert LoRA on Cutlass FusedMoE backend.
Background:
This MR lands Python helpers to auto-detect and validate "sharedness". A follow-up kernel/integration MR wires the fused-MoE op so that the shared side can be stored once without replication in the cache.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
New Features
Tests