Fix score calculation for MOE Learable by chang-wenbin · Pull Request #7902 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-05-24T08:59:30Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-24T08:59:39Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-24 17:20:14

📋 Review 摘要

PR 概述：将 MOE Learnable 路由中 routed_scaling_factor_learnable 的 per-expert 缩放操作从 moe_expert_dispatch 调用前移至调用后，修复缩放结果被覆盖导致无效的 Bug。

变更范围：fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py（cutlass MoE backend，apply_tp 函数）

影响面 Tag：[OP]

问题

级别	文件	概述
🟡 建议	`fused_moe_cutlass_backend.py:483`	其他 MoE backend 可能存在相同 `routed_scaling_factor_learnable` 缺陷，建议同步排查
📝 PR 规范	—	标题缺少官方 Tag；Motivation / Modifications / Accuracy Tests 段落均为空

📝 PR 规范检查

标题缺少官方 Tag（当前 "Fix score calculation for MOE Learable" 无 [Tag] 前缀，且 "Learable" 存在拼写错误）；PR 描述中 Motivation、Modifications、Usage or Command、Accuracy Tests 等 section 均为空或仅含占位注释，不符合模板要求。

标题建议（可直接复制）：

[BugFix] Fix score calculation for MOE Learnable routing

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
`routed_scaling_factor_learnable` 为 True 时，原代码在 `moe_expert_dispatch` 调用**之前**对 `topk_weights` 乘以 per-expert scale；而 `moe_expert_dispatch(topk_only_mode=True)` 会在内部重新计算并返回新的 `topk_weights`，导致之前的缩放结果被覆盖、缩放完全无效，最终造成 MOE Learnable 场景下 score 计算错误。

## Modifications
- `fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py`：将 `routed_scaling_factor_learnable` 的 per-expert 缩放逻辑从 `moe_expert_dispatch` 调用前移至调用后，确保缩放作用于最终使用的 `topk_weights`；同时将 `get_moe_scores` 返回的 `topk_weights`/`topk_idx` 改为丢弃（`_`, `__`），因为这两个值会被 `moe_expert_dispatch` 的返回值覆盖。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

修复逻辑正确：原代码在 moe_expert_dispatch 会覆盖 topk_weights 的情况下提前对其应用 learnable 缩放，导致缩放无效；本次将缩放移到 dispatch 之后，确保最终实际使用的权重得到正确缩放。建议同步排查其他 MoE backend（deepgemm / triton / marlin / wint2 / blackwell）中是否存在相同缺陷。

PaddlePaddle-bot · 2026-05-24T09:23:48Z

                topk_only_mode=True,
            )
+
+            if layer.routed_scaling_factor_learnable:


🟡 建议 请同步排查其他 MoE backend（fused_moe_deepgemm_backend.py、fused_moe_triton_backend.py、fused_moe_marlin_backend.py、fused_moe_wint2_backend.py、fused_moe_blackwell_backend.py）中是否存在相同的 routed_scaling_factor_learnable 缺陷（即在 moe_expert_dispatch 调用前对 topk_weights 应用缩放，而该调用会覆盖 topk_weights）。

修复策略：在各 backend 中同样将缩放逻辑移到 moe_expert_dispatch 返回之后。

fix_moe_learable-score1

ec7b457

chang-wenbin had a problem deploying to Metax_ci May 24, 2026 08:59 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix score calculation for MOE Learable#7902

Fix score calculation for MOE Learable#7902
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:fix_moe_learable-score11

chang-wenbin commented May 24, 2026

Uh oh!

paddle-bot Bot commented May 24, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chang-wenbin commented May 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 24, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants