Add AI written qwen3_moe example by skyw · Pull Request #2887 · NVIDIA/TransformerEngine

skyw · 2026-04-15T18:30:35Z

Description

A almost pure TE module implementation of Qwen3 Moe model

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Add Qwen3 MoE model use TE module only
Simple test to match HF counterpart.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps · 2026-04-15T18:40:31Z

Greptile Summary

This PR adds a new examples/pytorch/qwen3_moe/ directory with a single-GPU Qwen3 MoE implementation built entirely from TransformerEngine modules (te.MultiheadAttention, te.RMSNorm, te_ops.GroupedLinear/SwiGLU, te.moe_permute_with_probs/moe_unpermute) and a forward+backward numerical comparison test against HuggingFace. The architecture mapping is faithful to the HF reference and the TE API is used correctly throughout.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/documentation issues that do not affect runtime behavior.

The model implementation and weight-copy logic are correct for the default configuration (fuse_qkv_params=False). The two P2 findings are a misleading code comment and a dead code branch that is never reached with the current model setup. No P0 or P1 issues remain unaddressed.

examples/pytorch/qwen3_moe/test_vs_hf.py — dead "qkv" weight-copy branch (lines 106-109) should be removed or corrected before fuse_qkv_params=True is ever used.

Important Files Changed

Filename	Overview
examples/pytorch/qwen3_moe/config.py	Frozen dataclass with HF-compatible Qwen3MoeConfig defaults; clean and straightforward.
examples/pytorch/qwen3_moe/model.py	Complete TE module implementation mapping HF Qwen3 MoE to TE equivalents; one misleading comment about CPU sync (P2).
examples/pytorch/qwen3_moe/test_vs_hf.py	Forward/backward weight-mapping test; contains a dead "qkv" weight-copy branch with incorrect GQA interleaved layout (P2), plus the already-flagged no-op data.copy_() on backward logits.
examples/pytorch/qwen3_moe/README.md	Clear module-mapping table and usage instructions; no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["input_ids (B, S)"] --> B["embed_tokens → hidden_states (B, S, H)"]
    B --> C["RotaryPositionEmbedding → freqs"]
    C --> D{"For each DecoderLayer"}
    D --> E["residual = hidden_states"]
    E --> F["te.MultiheadAttention\n(fused LN + QKV + QK-norm + RoPE + attn + O-proj)"]
    F --> G["hidden_states = residual + attn_out"]
    G --> H["residual = hidden_states"]
    H --> I["te.RMSNorm (post_attention_layernorm)"]
    I --> J["Qwen3MoeBlock"]
    subgraph MoE ["Qwen3MoeBlock"]
        J1["hidden_flat (T, H)"] --> J2["Qwen3MoeRouter\n(softmax + top-k)"]
        J2 --> J3["merging_probs, routing_map,\ntokens_per_expert, router_logits"]
        J3 --> J4["te.moe_permute_with_probs\n→ permuted_input (T*k, H)"]
        J4 --> J5["te_ops.Sequential\nGroupedLinear → SwiGLU → GroupedLinear"]
        J5 --> J6["te.moe_unpermute\n→ output (T, H)"]
    end
    J --> J1
    J6 --> K["hidden_states = residual + moe_out"]
    K --> D
    D --> L["te.RMSNorm (final norm)"]
    L --> M["te.Linear (lm_head) → logits (B, S, V)"]

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into vibe_qwen3" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

Signed-off-by: Hao Wu <skyw@nvidia.com>

skyw added 4 commits April 15, 2026 11:16

add qwen3 model to example

4bd1efb

Signed-off-by: Hao Wu <skyw@nvidia.com>

add readme

c56277e

Signed-off-by: Hao Wu <skyw@nvidia.com>

remove python 3.12 feature

2ea1d36

Signed-off-by: Hao Wu <skyw@nvidia.com>

add license

ed3cf31

Signed-off-by: Hao Wu <skyw@nvidia.com>

ksivaman self-requested a review April 15, 2026 18:39

greptile-apps bot reviewed Apr 15, 2026

View reviewed changes

Comment thread examples/pytorch/qwen3_moe/test_vs_hf.py Outdated

Comment thread examples/pytorch/qwen3_moe/test_vs_hf.py

skyw and others added 4 commits April 15, 2026 12:30

Update examples/pytorch/qwen3_moe/test_vs_hf.py

4063918

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

Update examples/pytorch/qwen3_moe/test_vs_hf.py

fc0ecf7

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

rollback wrong changes initiated by AI

189b18f

Signed-off-by: Hao Wu <skyw@nvidia.com>

Merge branch 'main' into vibe_qwen3

5ad6ba9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI written qwen3_moe example#2887

Add AI written qwen3_moe example#2887
skyw wants to merge 8 commits intoNVIDIA:mainfrom
skyw:vibe_qwen3

skyw commented Apr 15, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skyw commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

skyw commented Apr 15, 2026 •

edited

Loading

greptile-apps bot commented Apr 15, 2026 •

edited

Loading