Skip to content

Fix score calculation for MOE Learable#7902

Open
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:fix_moe_learable-score11
Open

Fix score calculation for MOE Learable#7902
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:fix_moe_learable-score11

Conversation

@chang-wenbin
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 24, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-24 17:20:14

📋 Review 摘要

PR 概述:将 MOE Learnable 路由中 routed_scaling_factor_learnable 的 per-expert 缩放操作从 moe_expert_dispatch 调用前移至调用后,修复缩放结果被覆盖导致无效的 Bug。

变更范围fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py(cutlass MoE backend,apply_tp 函数)

影响面 Tag[OP]

问题

级别 文件 概述
🟡 建议 fused_moe_cutlass_backend.py:483 其他 MoE backend 可能存在相同 routed_scaling_factor_learnable 缺陷,建议同步排查
📝 PR 规范 标题缺少官方 Tag;Motivation / Modifications / Accuracy Tests 段落均为空

📝 PR 规范检查

标题缺少官方 Tag(当前 "Fix score calculation for MOE Learable" 无 [Tag] 前缀,且 "Learable" 存在拼写错误);PR 描述中 MotivationModificationsUsage or CommandAccuracy Tests 等 section 均为空或仅含占位注释,不符合模板要求。

标题建议(可直接复制):

  • [BugFix] Fix score calculation for MOE Learnable routing

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
`routed_scaling_factor_learnable` 为 True 时,原代码在 `moe_expert_dispatch` 调用**之前**`topk_weights` 乘以 per-expert scale;而 `moe_expert_dispatch(topk_only_mode=True)` 会在内部重新计算并返回新的 `topk_weights`,导致之前的缩放结果被覆盖、缩放完全无效,最终造成 MOE Learnable 场景下 score 计算错误。

## Modifications
- `fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py`:将 `routed_scaling_factor_learnable` 的 per-expert 缩放逻辑从 `moe_expert_dispatch` 调用前移至调用后,确保缩放作用于最终使用的 `topk_weights`;同时将 `get_moe_scores` 返回的 `topk_weights`/`topk_idx` 改为丢弃(`_`, `__`),因为这两个值会被 `moe_expert_dispatch` 的返回值覆盖。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

修复逻辑正确:原代码在 moe_expert_dispatch 会覆盖 topk_weights 的情况下提前对其应用 learnable 缩放,导致缩放无效;本次将缩放移到 dispatch 之后,确保最终实际使用的权重得到正确缩放。建议同步排查其他 MoE backend(deepgemm / triton / marlin / wint2 / blackwell)中是否存在相同缺陷。

topk_only_mode=True,
)

if layer.routed_scaling_factor_learnable:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 请同步排查其他 MoE backend(fused_moe_deepgemm_backend.pyfused_moe_triton_backend.pyfused_moe_marlin_backend.pyfused_moe_wint2_backend.pyfused_moe_blackwell_backend.py)中是否存在相同的 routed_scaling_factor_learnable 缺陷(即在 moe_expert_dispatch 调用前对 topk_weights 应用缩放,而该调用会覆盖 topk_weights)。

修复策略:在各 backend 中同样将缩放逻辑移到 moe_expert_dispatch 返回之后。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants