Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -446,7 +446,7 @@ def apply_tp(
gate_out = gate_out.cast("float32")
if fc1_latent_proj is not None:
x = fc1_latent_proj(x)
gate_out, topk_weights, topk_idx = get_moe_scores(
gate_out, _, __ = get_moe_scores(
gate_out,
layer.n_group,
layer.topk_group,
Expand All @@ -458,11 +458,6 @@ def apply_tp(
use_fused_cast=use_fused,
)

if layer.routed_scaling_factor_learnable:
safe_topk_indices = paddle.clip(topk_idx, min=0)
gathered_scales = F.embedding(safe_topk_indices, layer.per_expert_scale.unsqueeze(1)).squeeze(-1)
topk_weights = topk_weights * gathered_scales

(
permute_input,
token_nums_per_expert,
Expand All @@ -484,6 +479,12 @@ def apply_tp(
self.moe_quant_type,
topk_only_mode=True,
)

if layer.routed_scaling_factor_learnable:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 请同步排查其他 MoE backend(fused_moe_deepgemm_backend.pyfused_moe_triton_backend.pyfused_moe_marlin_backend.pyfused_moe_wint2_backend.pyfused_moe_blackwell_backend.py)中是否存在相同的 routed_scaling_factor_learnable 缺陷(即在 moe_expert_dispatch 调用前对 topk_weights 应用缩放,而该调用会覆盖 topk_weights)。

修复策略:在各 backend 中同样将缩放逻辑移到 moe_expert_dispatch 返回之后。

safe_topk_indices = paddle.clip(topk_idx, min=0)
gathered_scales = F.embedding(safe_topk_indices, layer.per_expert_scale.unsqueeze(1)).squeeze(-1)
topk_weights = topk_weights * gathered_scales

else:
gate_out = gate_out.cast("float32")
if fc1_latent_proj is not None:
Expand Down
Loading