[TRTLLM-12507][feat] Add routed-expert LoRA helpers by brb-nv · Pull Request #14764 · NVIDIA/TensorRT-LLM

brb-nv · 2026-05-29T21:59:50Z

Description

This MR is in preparation for #14763 where the goal is to support routed-expert LoRA on Cutlass FusedMoE backend.

Background:

In routed-expert MoE LoRA, a per-expert adapter stores a separate low-rank pair (A and B) for every expert, so each expert can apply a different correction.
A shared-outer adapter shares one matrix across all experts on the residual-stream side (typically A on the up/gate projections and B on the down projection) while the other side remains per-expert.
The math per routed token is unchanged; only how weights are stored and indexed differs.

This MR lands Python helpers to auto-detect and validate "sharedness". A follow-up kernel/integration MR wires the fused-MoE op so that the shared side can be stored once without replication in the cache.

Test Coverage

$ pytest tests/unittest/_torch/lora/test_moe_utils.py -s -v
$ pytest tests/unittest/others/test_lora_manager.py::TestLoraManagerMoeSharedFlags -s -v

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

New Features
- Added MoE LoRA (Mixture of Experts Low-Rank Adapters) support with flexible per-expert weight sharing configuration through metadata
- Enforced validation to ensure MoE LoRA compatibility with CUTLASS backend only and prevent usage with quantized base weights
Tests
- Added comprehensive unit tests for MoE LoRA layout parsing, validation, and parameter generation

coderabbitai · 2026-05-30T00:33:03Z

📝 Walkthrough

Walkthrough

This PR adds MoE LoRA shared-outer layout metadata support. It introduces lora_layout.json schema for declaring which LoRA matrix side is shared across experts, validates MoE LoRA requires CUTLASS backend and unquantized weights, integrates metadata loading into the LoRA manager, and merges per-adapter flags during inference for fused-MoE kernel consumption.

Changes

MoE LoRA Shared-Outer Layout

Layer / File(s)	Summary
MoE LoRA Layout Definitions and Reference Helpers `tensorrt_llm/_torch/peft/lora/moe_layout.py`, `tests/unittest/_torch/lora/test_moe_layout.py`	Introduces `MOE_LORA_MODULES` (canonical routed-expert adapter names), `DEFAULT_SHARED_SIDE` (module-to-side mapping), `make_per_expert_lora()` (per-expert tensor generator with sharing modes), and `reference_moe_lora_delta()` (reference delta computation). Tests verify sharing semantics, shapes, seeding, and computation correctness.
MoE LoRA Validation and Constraints `tensorrt_llm/_torch/peft/lora/validation.py`, `tests/unittest/_torch/lora/test_moe_lora_validator.py`	Adds `has_moe_lora_targets()` to detect MoE LoRA config presence and `check_moe_lora_supported()` to enforce CUTLASS backend and unquantized base weights when MoE LoRA is targeted. Tests cover target detection, backend rejection, quantization rejection, and error messaging.
Layout Metadata Schema, Parsing, and Kernel Flag Conversion `tensorrt_llm/lora_layout_metadata.py`, `tests/unittest/_torch/lora/test_lora_layout_metadata.py`	Implements `lora_layout.json` schema with version checking, `parse_lora_layout()` validation, `layout_to_kernel_flags()` conversion to six fused-MoE boolean flags, and `merge_moe_shared_flags_for_batch()` for batch consistency. Tests verify schema validation, missing/malformed metadata handling, flag mapping, and batch merging logic.
LoRA Manager Metadata Loading and Storage `tensorrt_llm/lora_manager.py`, `tests/unittest/others/test_lora_manager.py`	Loads `lora_layout.json` per adapter during HF import, stores kernel flags in `_uid_to_moe_shared_flags`, and exposes `get_moe_shared_flags(uid)` for retrieval. Tests verify metadata propagation, all-false defaults, unknown UID handling, and malformed metadata rejection.
MoE Backend Resolution with LoRA Validation `tensorrt_llm/_torch/modules/fused_moe/create_moe.py`	Forces `CutlassFusedMoE` in BF16 non-flashinfer path and calls `check_moe_lora_supported()` post-resolution to validate MoE LoRA compatibility at construction time with resolved backend and layer index.
Model Engine Inference: Eager LoRA Parameter Assembly with MoE Flags `tensorrt_llm/_torch/pyexecutor/model_engine.py`	Refactors eager LoRA assembly to delegate to `_get_eager_lora_params_from_requests()`, queries `peft_cache_manager` for per-adapter flags, merges them via `merge_moe_shared_flags_for_batch()`, and attaches the result to `lora_params["moe_shared_flags"]` for fused-MoE kernel consumption.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

2ez4bz
byshiue
amitz-nv
syuoni
venkywonka

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title '[TRTLLM-12507][feat] Add routed-expert LoRA helpers' accurately describes the main change: adding Python helpers for routed-expert MoE LoRA with layout and validation support, which is the primary objective of this PR.
Description check	✅ Passed	The PR description provides clear context on the feature (routed-expert LoRA support), relevant background, specific test commands, and completion of the checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)
182-203: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align MoE LoRA quantization validation with per-layer mode

create_moe.py computes has_quant from effective_quant_config.layer_quant_mode.has_any_quant(exclude_kv_cache=True) to decide whether to use CutlassFusedMoE.

tensorrt_llm/_torch/peft/lora/validation.py::check_moe_lora_supported decides LoRA support from quant_config.quant_mode (via quant_mode.has_any_quant(exclude_kv_cache=True)).
If quant_mode and layer_quant_mode differ under mixed/per-layer quantization, the validator can accept/reject based on a different “quantized-ness” than the backend-selection logic. Pass the per-layer quant mode (or a layer-derived quant_mode) into the validator for consistency.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 182 - 203,
The LoRA validator is using the global quant_config while the backend decision
uses the per-layer quant mode; update the call to check_moe_lora_supported in
create_moe.py to pass the per-layer quant mode (from
effective_quant_config.layer_quant_mode) instead of the top-level quant_config
so both checks use the same per-layer quantization view; ensure you handle
effective_quant_config being None (pass None or a sensible default quant_mode)
and keep the resolved_backend logic (resolved_backend variable and layer_idx)
unchanged so diagnostics still map non-Cutlass resolutions back to the requested
backend name.

🧹 Nitpick comments (5)

tensorrt_llm/_torch/peft/lora/validation.py (1)
16-24: ⚡ Quick win

Use built-in set instead of typing.Set.

_normalize_targets returns Set[str]; prefer set[str]. Likewise Optional[...] can use X | None per the repo style.
♻️ Proposed change
-from typing import Iterable, Optional, Set
+from collections.abc import Iterable
+from typing import Optional
-def _normalize_targets(lora_target_modules: Iterable[str]) -> Set[str]:
+def _normalize_targets(lora_target_modules: Iterable[str]) -> set[str]:
As per coding guidelines: "Prefer built-in types list, dict, tuple over legacy typing.List, typing.Dict, typing.Tuple".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/peft/lora/validation.py` around lines 16 - 24, Replace
legacy typing generics with built-in generics and remove unused typing imports:
change the return annotation of _normalize_targets from typing.Set[str] to
set[str], update any Optional[...] occurrences to the modern X | None form, and
remove unused imports like Set and Optional from the top; update the import line
so only Iterable is imported from typing (or remove typing entirely if not
needed) while leaving references to LoraConfig and MOE_LORA_MODULE_NAMES
untouched.
tensorrt_llm/_torch/peft/lora/moe_layout.py (1)
9-9: ⚡ Quick win

Prefer built-in generics over typing.Dict/typing.Tuple.

This new module uses legacy typing.Dict/typing.Tuple aliases (e.g. Lines 16, 24, 41, 63, 88 return types). Per the repo guideline, use built-in dict/tuple (and int | None for Optional[int]).
♻️ Example
-from typing import Dict, Literal, Optional, Tuple
+from typing import Literal, Optional
-MOE_LORA_MODULES: Tuple[str, ...] = ("moe_h_to_4h", "moe_4h_to_h", "moe_gate")
+MOE_LORA_MODULES: tuple[str, ...] = ("moe_h_to_4h", "moe_4h_to_h", "moe_gate")
-DEFAULT_SHARED_SIDE: Dict[str, SharedSide] = {
+DEFAULT_SHARED_SIDE: dict[str, SharedSide] = {
(and -> Dict[str, torch.Tensor] → -> dict[str, torch.Tensor])
As per coding guidelines: "Prefer built-in types list, dict, tuple over legacy typing.List, typing.Dict, typing.Tuple".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/peft/lora/moe_layout.py` at line 9, The type annotations
in moe_layout.py use legacy typing aliases (typing.Dict, typing.Tuple, Optional)
in function/method return and parameter annotations; replace all occurrences
with built-in generics (dict, tuple) and use PEP‑604 union syntax for optional
types (e.g., int | None) or plain None union (e.g., torch.Tensor | None) —
update each signature and any import so that typing.Dict/typing.Tuple/Optional
are removed and signatures like -> Dict[str, torch.Tensor] become -> dict[str,
torch.Tensor], Tuple[...] become tuple[...], and Optional[int] becomes int |
None (ensure any imports from typing are cleaned up accordingly).
tests/unittest/others/test_lora_manager.py (1)
188-196: ⚡ Quick win

Make Optional explicit in function signature.

The parameter layout_metadata: dict = None has an implicit Optional type, which violates PEP 484. Use the | syntax to make it explicit.
♻️ Proposed fix
 def _create_dummy_moe_hf_lora_adapter(
     adapter_dir: Path,
     hidden_size: int = 64,
     intermediate_size: int = 128,
     rank: int = 8,
     num_layers: int = 2,
     num_experts: int = 4,
-    layout_metadata: dict = None,
+    layout_metadata: dict | None = None,
 ):
As per coding guidelines, use | syntax instead of typing.Union, and avoid implicit Optional by making None defaults explicit in the type.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/others/test_lora_manager.py` around lines 188 - 196, The
function _create_dummy_moe_hf_lora_adapter has an implicitly optional parameter
layout_metadata: dict = None; update its signature to make the Optional explicit
using PEP 604 syntax (e.g., layout_metadata: dict | None = None) so the type is
explicit and compliant; modify only the parameter annotation in
_create_dummy_moe_hf_lora_adapter (no behavioral changes) and run tests to
confirm typing annotations are accepted.
tensorrt_llm/lora_manager.py (1)
27-35: ⚡ Quick win

Prefer Python 3.10+ built-in generic types in new imports.

The new imports use legacy typing generics. Since this codebase targets Python 3.10+, prefer built-in dict over Dict in type hints throughout the new code.
♻️ Refactor imports
 from .lora_layout_metadata import (
-    all_false_flags as _moe_all_false_shared_flags,
-)
-from .lora_layout_metadata import (
-    layout_to_kernel_flags as _moe_layout_to_kernel_flags,
-)
-from .lora_layout_metadata import (
-    load_lora_layout_metadata as _load_lora_layout_metadata,
+    all_false_flags as _moe_all_false_shared_flags,
+    layout_to_kernel_flags as _moe_layout_to_kernel_flags,
+    load_lora_layout_metadata as _load_lora_layout_metadata,
 )
Then update the new method signature at line 1260:
-    def get_moe_shared_flags(self, uid: str) -> Dict[str, bool]:
+    def get_moe_shared_flags(self, uid: str) -> dict[str, bool]:
And line 751:
-        self._uid_to_moe_shared_flags: Dict[str, Dict[str, bool]] = {}
+        self._uid_to_moe_shared_flags: dict[str, dict[str, bool]] = {}
As per coding guidelines, prefer built-in types over typing.Dict.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/lora_manager.py` around lines 27 - 35, Replace legacy typing
generics with built-in generics: update the imports and all type hints that use
typing.Dict to use built-in dict for the new symbols
(_moe_all_false_shared_flags, _moe_layout_to_kernel_flags,
_load_lora_layout_metadata). Also update the function signatures that currently
use Dict in the new code paths (the two updated method signatures referenced in
the review) to use dict instead of typing.Dict so type hints use Python 3.10+
built-ins.
tensorrt_llm/lora_layout_metadata.py (1)
30-32: ⚡ Quick win

Prefer Python 3.10+ built-in generic types.

The codebase targets Python 3.10+, so dict, list, etc. can be used directly in type hints without importing from typing. Replace Dict, Optional, Callable, and Iterable with their built-in equivalents or collections.abc imports where appropriate.
♻️ Refactor to use built-in types
-from typing import Callable, Dict, Iterable, Optional
+from collections.abc import Callable, Iterable

 # Shared-side value for a single module: "A", "B", or None for per-expert.
-SharedSide = Optional[str]
+SharedSide = str | None
Then update all function signatures:

Dict[str, str] → dict[str, str]

Dict[str, bool] → dict[str, bool]

Optional[Dict[str, SharedSide]] → dict[str, SharedSide] | None

Callable[[str], Dict[str, bool]] → Callable[[str], dict[str, bool]]
As per coding guidelines, prefer built-in types list, dict, tuple over legacy typing.List, typing.Dict, typing.Tuple; use | syntax instead of typing.Union.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/lora_layout_metadata.py` around lines 30 - 32, The file imports
legacy typing aliases (Dict, Optional, Callable, Iterable); update to Python
3.10+ built-ins and collections.abc where appropriate: remove
Dict/Optional/Callable/Iterable from the typing import and replace uses like
Dict[str, str] -> dict[str, str], Dict[str, bool] -> dict[str, bool],
Optional[Dict[str, SharedSide]] -> dict[str, SharedSide] | None, and
Callable[[str], Dict[str, bool]] -> Callable[[str], dict[str, bool]] (import
Callable/Iterable from collections.abc if you need runtime ABCs); update all
function signatures and annotations in this module to use these built-in
generics and the | union syntax.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Line 23: Import ordering is incorrect: move the new import
"check_moe_lora_supported" into the group of other parent-package imports (the
other "from ...peft.*" / "from ...*" imports at the top of create_moe.py) so it
follows the project's isort grouping rules; locate the line "from
...peft.lora.validation import check_moe_lora_supported" and place it adjacent
to the existing parent-level imports (near the other "...peft..." imports) to
restore correct import group ordering.

In `@tensorrt_llm/lora_layout_metadata.py`:
- Around line 154-183: The docstring for layout_to_kernel_flags violates D205
because the summary and description are not separated by a single blank line;
update the docstring so the first line is a concise one-line summary (e.g.,
"Translate per-module shared-side map into kernel flags.") followed by a single
blank line and then the detailed description/args/returns block, preserving
existing content and triple-quote style for layout_to_kernel_flags.

In `@tensorrt_llm/lora_manager.py`:
- Around line 1260-1277: The docstring for get_moe_shared_flags violates D205
since there's no blank line between the one-line summary and the rest of the
description; update the get_moe_shared_flags method's docstring to insert a
single blank line after the summary line (so the short summary line is followed
by an empty line before the longer description and parameter/return details),
ensuring the opening and closing triple quotes remain correctly positioned and
the rest of the text (including the list of keys and reference to
lora_layout_metadata.py) is preserved.

In `@tests/unittest/_torch/lora/test_lora_layout_metadata.py`:
- Line 234: The file tests/unittest/_torch/lora/test_lora_layout_metadata.py is
missing a trailing newline at EOF; ensure the file ends with exactly one blank
line by adding a single newline character after the last line (so the file
terminates with a final newline).

---

Outside diff comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 182-203: The LoRA validator is using the global quant_config while
the backend decision uses the per-layer quant mode; update the call to
check_moe_lora_supported in create_moe.py to pass the per-layer quant mode (from
effective_quant_config.layer_quant_mode) instead of the top-level quant_config
so both checks use the same per-layer quantization view; ensure you handle
effective_quant_config being None (pass None or a sensible default quant_mode)
and keep the resolved_backend logic (resolved_backend variable and layer_idx)
unchanged so diagnostics still map non-Cutlass resolutions back to the requested
backend name.

---

Nitpick comments:
In `@tensorrt_llm/_torch/peft/lora/moe_layout.py`:
- Line 9: The type annotations in moe_layout.py use legacy typing aliases
(typing.Dict, typing.Tuple, Optional) in function/method return and parameter
annotations; replace all occurrences with built-in generics (dict, tuple) and
use PEP‑604 union syntax for optional types (e.g., int | None) or plain None
union (e.g., torch.Tensor | None) — update each signature and any import so that
typing.Dict/typing.Tuple/Optional are removed and signatures like -> Dict[str,
torch.Tensor] become -> dict[str, torch.Tensor], Tuple[...] become tuple[...],
and Optional[int] becomes int | None (ensure any imports from typing are cleaned
up accordingly).

In `@tensorrt_llm/_torch/peft/lora/validation.py`:
- Around line 16-24: Replace legacy typing generics with built-in generics and
remove unused typing imports: change the return annotation of _normalize_targets
from typing.Set[str] to set[str], update any Optional[...] occurrences to the
modern X | None form, and remove unused imports like Set and Optional from the
top; update the import line so only Iterable is imported from typing (or remove
typing entirely if not needed) while leaving references to LoraConfig and
MOE_LORA_MODULE_NAMES untouched.

In `@tensorrt_llm/lora_layout_metadata.py`:
- Around line 30-32: The file imports legacy typing aliases (Dict, Optional,
Callable, Iterable); update to Python 3.10+ built-ins and collections.abc where
appropriate: remove Dict/Optional/Callable/Iterable from the typing import and
replace uses like Dict[str, str] -> dict[str, str], Dict[str, bool] -> dict[str,
bool], Optional[Dict[str, SharedSide]] -> dict[str, SharedSide] | None, and
Callable[[str], Dict[str, bool]] -> Callable[[str], dict[str, bool]] (import
Callable/Iterable from collections.abc if you need runtime ABCs); update all
function signatures and annotations in this module to use these built-in
generics and the | union syntax.

In `@tensorrt_llm/lora_manager.py`:
- Around line 27-35: Replace legacy typing generics with built-in generics:
update the imports and all type hints that use typing.Dict to use built-in dict
for the new symbols (_moe_all_false_shared_flags, _moe_layout_to_kernel_flags,
_load_lora_layout_metadata). Also update the function signatures that currently
use Dict in the new code paths (the two updated method signatures referenced in
the review) to use dict instead of typing.Dict so type hints use Python 3.10+
built-ins.

In `@tests/unittest/others/test_lora_manager.py`:
- Around line 188-196: The function _create_dummy_moe_hf_lora_adapter has an
implicitly optional parameter layout_metadata: dict = None; update its signature
to make the Optional explicit using PEP 604 syntax (e.g., layout_metadata: dict
| None = None) so the type is explicit and compliant; modify only the parameter
annotation in _create_dummy_moe_hf_lora_adapter (no behavioral changes) and run
tests to confirm typing annotations are accepted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f59714d8-4b76-4620-a880-dced106c4d2c

📥 Commits

Reviewing files that changed from the base of the PR and between 74d7c3a and 2b57061.

📒 Files selected for processing (10)

tensorrt_llm/_torch/modules/fused_moe/create_moe.py
tensorrt_llm/_torch/peft/lora/moe_layout.py
tensorrt_llm/_torch/peft/lora/validation.py
tensorrt_llm/_torch/pyexecutor/model_engine.py
tensorrt_llm/lora_layout_metadata.py
tensorrt_llm/lora_manager.py
tests/unittest/_torch/lora/test_lora_layout_metadata.py
tests/unittest/_torch/lora/test_moe_layout.py
tests/unittest/_torch/lora/test_moe_lora_validator.py
tests/unittest/others/test_lora_manager.py

brb-nv · 2026-05-30T02:27:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-30T02:34:17Z

PR_Github #51137 [ run ] triggered by Bot. Commit: 89dfc5e Link to invocation

brb-nv · 2026-05-30T02:36:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-30T02:42:46Z

PR_Github #51141 [ run ] triggered by Bot. Commit: 89dfc5e Link to invocation

tensorrt-cicd · 2026-05-30T02:47:33Z

PR_Github #51137 [ run ] completed with state ABORTED. Commit: 89dfc5e

Link to invocation

tensorrt-cicd · 2026-05-30T10:25:04Z

PR_Github #51141 [ run ] completed with state SUCCESS. Commit: 89dfc5e
/LLM/main/L0_MergeRequest_PR pipeline #40577 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

brb-nv · 2026-05-30T20:42:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-30T20:51:21Z

PR_Github #51213 [ run ] triggered by Bot. Commit: 89dfc5e Link to invocation

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

brb-nv · 2026-05-31T01:42:47Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-31T01:47:50Z

PR_Github #51213 [ run ] completed with state FAILURE. Commit: 89dfc5e
/LLM/main/L0_MergeRequest_PR pipeline #40637 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-05-31T01:49:00Z

PR_Github #51221 [ run ] triggered by Bot. Commit: 383ed54 Link to invocation

github-actions Bot assigned brb-nv May 29, 2026

brb-nv changed the title ~~[TRTLLM-12507][feat] add MoE LoRA layout and validation helpers~~ [TRTLLM-12507][feat] Add MoE LoRA layout and validation helpers May 29, 2026

brb-nv force-pushed the user/brb/moe-lora-helpers branch from 7dd0c55 to 2b57061 Compare May 30, 2026 00:19

brb-nv marked this pull request as ready for review May 30, 2026 00:23

brb-nv requested review from a team as code owners May 30, 2026 00:23

brb-nv requested review from QiJune, pcastonguay and venkywonka May 30, 2026 00:23

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/fused_moe/create_moe.py Outdated

Comment thread tensorrt_llm/lora_layout_metadata.py Outdated

Comment thread tensorrt_llm/lora_manager.py Outdated

Comment thread tests/unittest/_torch/lora/test_lora_layout_metadata.py Outdated

brb-nv force-pushed the user/brb/moe-lora-helpers branch 2 times, most recently from e0a26b8 to 7b9299a Compare May 30, 2026 02:25

brb-nv force-pushed the user/brb/moe-lora-helpers branch from 7b9299a to 89dfc5e Compare May 30, 2026 02:32

brb-nv force-pushed the user/brb/moe-lora-helpers branch 5 times, most recently from d450609 to 222bb44 Compare May 31, 2026 01:11

brb-nv changed the title ~~[TRTLLM-12507][feat] Add MoE LoRA layout and validation helpers~~ [TRTLLM-12507][feat] Add routed-expert LoRA helpers May 31, 2026

[TRTLLM-12507][feat] Add routed-expert LoRA helpers

383ed54

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

brb-nv force-pushed the user/brb/moe-lora-helpers branch from 45d03e9 to 383ed54 Compare May 31, 2026 01:29

Conversation

brb-nv commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brb-nv commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

brb-nv commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

brb-nv commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

brb-nv commented May 31, 2026

Uh oh!

tensorrt-cicd commented May 31, 2026

Uh oh!

tensorrt-cicd commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brb-nv commented May 29, 2026 •

edited

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading