Support MTP loss mask_type v1 and multi mtp_config by x54-729 · Pull Request #1919 · InternLM/xtuner

x54-729 · 2026-06-16T15:26:23Z

mtp config example:

model_cfg.text_config.mtp_config = [
    MTPConfig(name="normal", num_layers=TEXT_MTP_LAYERS, share_weights=TEXT_MTP_LAYERS>1, loss_scaling_factor=TEXT_MTP_FACTOR),
    MTPConfig(name="sci", num_layers=NUM_MTP_LAYERS, share_weights=NUM_MTP_LAYERS>1, loss_scaling_factor=NUM_MTP_FACTOR, loss_cfg=SciMTPLossConfig(mask_type="v1"))
]

Refer to HAOCHENYE#2

HAOCHENYE · 2026-06-16T18:57:33Z

        loss = torch.tensor(0.0, device=DEVICE)
        for key in model_outputs.model_fields:
            value = getattr(model_outputs, key)
-            if "loss" in key and isinstance(value, torch.Tensor):


The loss part should be autonomous to the model, rather than being hardcoded here.

HAOCHENYE · 2026-06-16T18:58:39Z

+        mask_type = self.loss_cfg.mask_type
+        if mask_type == "v1":
+            self.process_loss_weight_v1()
+        elif mask_type is not None:
+            raise NotImplementedError(f"Unknown MTP Loss Mask Type: {mask_type}")


The calculation logic of loss should not be hard-coded here; please implement a new loss_context.

HAOCHENYE · 2026-06-16T19:02:09Z

 import types
 from pathlib import Path
-from typing import TYPE_CHECKING, Annotated, Literal, Self, Sequence, TypedDict, cast
+from typing import TYPE_CHECKING, Annotated, List, Literal, Self, Sequence, TypedDict, cast


Use python builtin list

…single MTP

HAOCHENYE · 2026-06-17T14:55:21Z

+            if key == "mtp_loss" and isinstance(value, dict):
+                for mtp_loss_name, mtp_loss in value.items():
+                    loss += mtp_loss
+            elif "loss" in key and isinstance(value, torch.Tensor):


Suggested change

if key == "mtp_loss" and isinstance(value, dict):

for mtp_loss_name, mtp_loss in value.items():

loss += mtp_loss

elif "loss" in key and isinstance(value, torch.Tensor):

elif "loss" in key:

loss_values = list(value.values()) if isinstance(value, dict) else [value]

loss_values = [i for i in loss_values if isinstance(i, torch.Tensor)]

for value in loss_values:

loss += value

HAOCHENYE · 2026-06-17T14:57:50Z

+        output,
+        layer_hidden_states,
+        position_embeddings,
+        balancing_ctx,
+        z_ctx,
+        mtp_seq_ctx,
+        mtp_loss_ctx_dict,
+        keep_router: bool,


missing typehint here

HAOCHENYE · 2026-06-18T08:25:51Z

+        Args:
+            idx (int): 1-indexed MTP layer depth to bind.
+        """
+        self.mtp_depth = idx


Depth and layer count are two different things. The user configures the number of layers in the config, whereas the layer index — which layer this actually is — is known at construction time. I'd suggest moving the bind_mtp_depth logic into LossContext, so that a single MTPConfig can produce LossContext instances for the different layers.

HAOCHENYE · 2026-06-18T08:32:23Z

+            for mtp_config in self.config.mtp_config:
+                self._mtp_forward(
+                    mtp_config=mtp_config,
+                    output=output,
+                    layer_hidden_states=layer_hidden_states,
+                    position_embeddings=position_embeddings,


Let's keep this part of the code unchanged for now — we shouldn't be doing code cleanup/refactoring in this PR.

HAOCHENYE · 2026-06-18T08:55:26Z

+            global_mtp_idx = 0  # Track global MTP layer index across all mtp_configs
+            for mtp_name in self.mtp_block.keys():
+                mtp_block = self.mtp_block[mtp_name]
+                mtp_config = next((cfg for cfg in self.config.mtp_config if cfg.name == mtp_name), None)  # type: ignore


Just pass mtp_config directly into MTPBlock — no need for all this indirection.

HAOCHENYE · 2026-06-18T08:56:24Z

+                        mtp_layer = checkpoint_wrapper(mtp_layer, checkpoint_impl=CheckpointImpl.REENTRANT)
+                    mtp_block.layers[local_mtp_idx] = mtp_layer
+
+                    reshard_after_forward = local_mtp_idx != len(mtp_block.layers) - 1


reshard_after_forward should only be set to True for the last layer of the last mtp block

HAOCHENYE · 2026-06-18T09:03:10Z

+                        mtp_layer = checkpoint_wrapper(mtp_layer, checkpoint_impl=CheckpointImpl.REENTRANT)
+                    mtp_block.layers[local_mtp_idx] = mtp_layer


Suggested change

mtp_layer = checkpoint_wrapper(mtp_layer, checkpoint_impl=CheckpointImpl.REENTRANT)

mtp_block.layers[local_mtp_idx] = mtp_layer

mtp_layer = checkpoint_wrapper(mtp_layer, checkpoint_impl=CheckpointImpl.REENTRANT)

mtp_block.layers[local_mtp_idx] = mtp_layer

HAOCHENYE · 2026-06-18T09:05:00Z

        self.rotary_emb = self.build_rotary_embedding(config)
        self.embed_tokens = self.build_embeddings(config)
-        self.mtp_block = self.build_mtp_block(config) if config.mtp_config is not None else None
+        self.mtp_block = self.build_mtp_block_dict(config) if config.mtp_config is not None else None


Could we switch this to a ModuleList too? That way it'd be symmetric with the config.

add multi mtp config; add mtp mask type v1

11f2848

HAOCHENYE reviewed Jun 17, 2026

View reviewed changes

x54-729 added 2 commits June 17, 2026 16:22

Add SciMTPLossContext SciMTPConfig SciMTPLossConfig; Compatible with …

b427a7f

…single MTP

remove SciMTPConfig; bind layer_idx before build mtp loss ctx

7e80e95

HAOCHENYE reviewed Jun 18, 2026

View reviewed changes

move bind_mtp_Depth to MTPLossContext

5045ea9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MTP loss mask_type v1 and multi mtp_config#1919

Support MTP loss mask_type v1 and multi mtp_config#1919
x54-729 wants to merge 4 commits into
InternLM:mainfrom
x54-729:mtp_260616

x54-729 commented Jun 16, 2026 •

edited

Loading

Uh oh!

HAOCHENYE Jun 16, 2026

Uh oh!

HAOCHENYE Jun 16, 2026

Uh oh!

Uh oh!

HAOCHENYE Jun 16, 2026

Uh oh!

HAOCHENYE Jun 17, 2026

Uh oh!

HAOCHENYE Jun 17, 2026

Uh oh!

HAOCHENYE Jun 18, 2026

Uh oh!

HAOCHENYE Jun 18, 2026

Uh oh!

HAOCHENYE Jun 18, 2026

Uh oh!

HAOCHENYE Jun 18, 2026 •

edited

Loading

Uh oh!

HAOCHENYE Jun 18, 2026

Uh oh!

HAOCHENYE Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            if key == "mtp_loss" and isinstance(value, dict):
-                for mtp_loss_name, mtp_loss in value.items():
-                    loss += mtp_loss
-            elif "loss" in key and isinstance(value, torch.Tensor):
+            elif "loss" in key:
+                loss_values = list(value.values()) if isinstance(value, dict) else [value]
+                loss_values = [i for i in loss_values if isinstance(i, torch.Tensor)]
+                for value in loss_values:
+                    loss += value

		mtp_layer = checkpoint_wrapper(mtp_layer, checkpoint_impl=CheckpointImpl.REENTRANT)
		mtp_block.layers[local_mtp_idx] = mtp_layer

Conversation

x54-729 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

x54-729 commented Jun 16, 2026 •

edited

Loading

HAOCHENYE Jun 18, 2026 •

edited

Loading