support mtp_decoder_input_detach by Jintao-Huang · Pull Request #37 · modelscope/mcore-bridge

Jintao-Huang · 2026-04-18T13:17:33Z

PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 \
IMAGE_MAX_TOKEN_NUM=1024 \
VIDEO_MAX_TOKEN_NUM=128 \
FPS_MAX_FRAMES=12 \
megatron sft \
    --model Qwen/Qwen3.5-4B \
    --save_safetensors true \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
              'AI-ModelScope/LaTeX_OCR:human_handwrite#2000' \
    --model_author swift \
    --model_name swift-robot \
    --linear_decoupled_in_proj true \
    --load_from_cache_file true \
    --add_non_thinking_prefix true \
    --fp8_recipe blockwise \
    --fp8_format e4m3 \
    --fp8_param_gather true \
    --split_dataset_ratio 0.01 \
    --tuner_type full \
    --mtp_decoder_input_detach true \
    --tensor_model_parallel_size 2 \
    --micro_batch_size 1 \
    --global_batch_size 2 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --num_train_epochs 1 \
    --packing true \
    --finetune true \
    --freeze_llm false \
    --freeze_vit false \
    --freeze_aligner false \
    --cross_entropy_loss_fusion true \
    --lr 1e-5 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-6 \
    --output_dir megatron_output/Qwen3.5-4B-FP8 \
    --eval_steps 200 \
    --save_steps 200 \
    --max_length 4096 \
    --dataloader_num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true \
    --no_save_rng true \
    --sequence_parallel true \
    --mtp_num_layers 1 \
    --attention_backend flash

"mtp_decoder_input_detach": true,

"mtp_decoder_input_detach": false

Jintao-Huang · 2026-04-18T13:18:06Z

#29

gemini-code-assist

Code Review

This pull request introduces a new configuration option mtp_decoder_input_detach to the ModelConfig and implements the corresponding logic in the MTP layer to detach decoder inputs when enabled. It also refactors the apply_module import to the module level for efficiency and improves the DSAIndexer patching logic in patcher.py by renaming classes to avoid name shadowing. I have no feedback to provide as the review comments were purely explanatory or validated the existing implementation.

support mtp_decoder_input_detach

f2470b5

tastelikefeet approved these changes Apr 18, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 18, 2026

View reviewed changes

Jintao-Huang mentioned this pull request Apr 18, 2026

[megatron] mtp_decoder_input_detach modelscope/ms-swift#9146

Merged

lint pass

572a0b0

hjh0119 approved these changes Apr 18, 2026

View reviewed changes

Jintao-Huang merged commit 7e9a765 into modelscope:main Apr 18, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support mtp_decoder_input_detach#37

support mtp_decoder_input_detach#37
Jintao-Huang merged 2 commits intomodelscope:mainfrom
Jintao-Huang:support_mtp_decoder_input_detach

Jintao-Huang commented Apr 18, 2026

Uh oh!

Jintao-Huang commented Apr 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jintao-Huang commented Apr 18, 2026

Uh oh!

Jintao-Huang commented Apr 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants