Fix score calculation for MOE Learable#7902
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-24 17:20:14
📋 Review 摘要
PR 概述:将 MOE Learnable 路由中 routed_scaling_factor_learnable 的 per-expert 缩放操作从 moe_expert_dispatch 调用前移至调用后,修复缩放结果被覆盖导致无效的 Bug。
变更范围:fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py(cutlass MoE backend,apply_tp 函数)
影响面 Tag:[OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fused_moe_cutlass_backend.py:483 |
其他 MoE backend 可能存在相同 routed_scaling_factor_learnable 缺陷,建议同步排查 |
| 📝 PR 规范 | — | 标题缺少官方 Tag;Motivation / Modifications / Accuracy Tests 段落均为空 |
📝 PR 规范检查
标题缺少官方 Tag(当前 "Fix score calculation for MOE Learable" 无 [Tag] 前缀,且 "Learable" 存在拼写错误);PR 描述中 Motivation、Modifications、Usage or Command、Accuracy Tests 等 section 均为空或仅含占位注释,不符合模板要求。
标题建议(可直接复制):
[BugFix] Fix score calculation for MOE Learnable routing
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
`routed_scaling_factor_learnable` 为 True 时,原代码在 `moe_expert_dispatch` 调用**之前**对 `topk_weights` 乘以 per-expert scale;而 `moe_expert_dispatch(topk_only_mode=True)` 会在内部重新计算并返回新的 `topk_weights`,导致之前的缩放结果被覆盖、缩放完全无效,最终造成 MOE Learnable 场景下 score 计算错误。
## Modifications
- `fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py`:将 `routed_scaling_factor_learnable` 的 per-expert 缩放逻辑从 `moe_expert_dispatch` 调用前移至调用后,确保缩放作用于最终使用的 `topk_weights`;同时将 `get_moe_scores` 返回的 `topk_weights`/`topk_idx` 改为丢弃(`_`, `__`),因为这两个值会被 `moe_expert_dispatch` 的返回值覆盖。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
修复逻辑正确:原代码在 moe_expert_dispatch 会覆盖 topk_weights 的情况下提前对其应用 learnable 缩放,导致缩放无效;本次将缩放移到 dispatch 之后,确保最终实际使用的权重得到正确缩放。建议同步排查其他 MoE backend(deepgemm / triton / marlin / wint2 / blackwell)中是否存在相同缺陷。
| topk_only_mode=True, | ||
| ) | ||
|
|
||
| if layer.routed_scaling_factor_learnable: |
There was a problem hiding this comment.
🟡 建议 请同步排查其他 MoE backend(fused_moe_deepgemm_backend.py、fused_moe_triton_backend.py、fused_moe_marlin_backend.py、fused_moe_wint2_backend.py、fused_moe_blackwell_backend.py)中是否存在相同的 routed_scaling_factor_learnable 缺陷(即在 moe_expert_dispatch 调用前对 topk_weights 应用缩放,而该调用会覆盖 topk_weights)。
修复策略:在各 backend 中同样将缩放逻辑移到 moe_expert_dispatch 返回之后。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.