Skip to content

qwen: Add support for Qwen3-Embedding models in encode_only mode#14758

Open
Priyanshu31102003 wants to merge 1 commit into
NVIDIA:mainfrom
Priyanshu31102003:feature-qwen3-embedding-support
Open

qwen: Add support for Qwen3-Embedding models in encode_only mode#14758
Priyanshu31102003 wants to merge 1 commit into
NVIDIA:mainfrom
Priyanshu31102003:feature-qwen3-embedding-support

Conversation

@Priyanshu31102003
Copy link
Copy Markdown

@Priyanshu31102003 Priyanshu31102003 commented May 29, 2026

Description

This PR introduces support for Qwen3-Embedding architectures within the Qwen model implementation. It enables encode_only mode by bypassing the language modeling head projection when an embedding or reward model architecture is detected.

Changes

  • config.py: Added is_embedding attribute to QWenConfig and auto-detection logic within from_hugging_face for mapping embedding-specific architectures (e.g., Qwen2ForRewardModel, Qwen2ForEmbedding, Qwen3ForEmbedding).
  • model.py: Updated QWenForCausalLM.__init__ to suppress lm_head allocation when is_embedding is active. Updated the ModelWeightsLoader custom dictionary to gracefully process layer normalization weights without requiring classification target bounds.

Summary by CodeRabbit

  • New Features
    • Extended Qwen model support to include dedicated embedding architectures with specialized configuration handling.
    • Implemented automatic detection and proper initialization of embedding-only model variants during HuggingFace conversion.
    • Added optimized weight mapping for embedding model types to ensure correct model structure.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR extends Qwen model support to embedding-only variants by introducing an is_embedding configuration flag detected during HuggingFace loading. When enabled, the model skips language model head construction and adjusts weight mapping for embedding architectures.

Changes

Embedding-only Qwen model support

Layer / File(s) Summary
Config parameter and HuggingFace detection
tensorrt_llm/models/qwen/config.py
QWenConfig.__init__ adds is_embedding: bool = False parameter and stores it. from_hugging_face extends qwen_type list to include qwen3_embedding, detects embedding models from architecture names, and passes the derived flag to the constructor.
Model conditioning for embedding-only mode
tensorrt_llm/models/qwen/model.py
QWenForCausalLM.__init__ skips building a vocabulary ColumnLinear head and sets lm_head = None when config.is_embedding is true. Weight conversion custom key mapping adds logic to map q_layernorm/k_layernorm to q_norm/k_norm for embedding models.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description adequately explains the purpose and changes, but omits required sections from the template: Title format (ticket/type), Test Coverage, and PR Checklist. Add PR title with proper format [ticket/issue][type], include Test Coverage section listing relevant tests, and complete the PR Checklist with required items.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for Qwen3-Embedding models in encode_only mode, which aligns with the primary modifications to config.py and model.py.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/models/qwen/config.py (1)

38-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve is_embedding in config serialization.

QWenConfig.to_dict() is still hand-maintained and never emits this new field. Any config round-trip will silently clear embedding-only mode and rebuild a regular LM head on reload.

Suggested fix
     def to_dict(self):
         output = super().to_dict()
         # Serialize the fields added in QWenConfig
         output['mlp_bias'] = self.mlp_bias
         output['attn_bias'] = self.attn_bias
         output['rotary_base'] = self.rotary_base
         output['rotary_scaling'] = self.rotary_scaling
         output[
             'disable_weight_only_quant_plugin'] = self.disable_weight_only_quant_plugin
         output['use_logn_attn'] = self.use_logn_attn
         output['mlp_only_layers'] = self.mlp_only_layers
         output['decoder_sparse_step'] = self.decoder_sparse_step
+        output['is_embedding'] = self.is_embedding
         output['moe'] = self.moe.to_dict()
         return output
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/models/qwen/config.py` around lines 38 - 49, QWenConfig
currently stores is_embedding but QWenConfig.to_dict() does not emit it, so
include "is_embedding" in the dict returned by QWenConfig.to_dict();
additionally ensure any deserialization path (e.g., QWenConfig.from_dict or the
constructor signature) accepts and preserves the is_embedding flag so a
round-trip retains embedding-only mode rather than rebuilding a regular LM head.
Reference: QWenConfig.to_dict, QWenConfig.from_dict/__init__, and the
is_embedding attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/models/qwen/model.py`:
- Around line 385-389: The embedding checkpoint type 'qwen3_embedding' must be
treated like the Qwen3 attention path: update the embedding-name-to-parameter
mapping block (where getattr(config, 'is_embedding', False) builds custom_dict
with "q_layernorm"/"k_layernorm") and ensure the model loader and
QWenDecoderLayer recognize 'qwen3_embedding' the same way as 'qwen3' and
'qwen3_moe'; also update the attn_bias derivation in
tensorrt_llm/models/qwen/config.py (the tuple that currently lists
('qwen3','qwen3_moe')) to include 'qwen3_embedding' so config.attn_bias,
model_type checks, and QWenDecoderLayer's qk_layernorm handling remain
consistent for that checkpoint.

---

Outside diff comments:
In `@tensorrt_llm/models/qwen/config.py`:
- Around line 38-49: QWenConfig currently stores is_embedding but
QWenConfig.to_dict() does not emit it, so include "is_embedding" in the dict
returned by QWenConfig.to_dict(); additionally ensure any deserialization path
(e.g., QWenConfig.from_dict or the constructor signature) accepts and preserves
the is_embedding flag so a round-trip retains embedding-only mode rather than
rebuilding a regular LM head. Reference: QWenConfig.to_dict,
QWenConfig.from_dict/__init__, and the is_embedding attribute.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f865eaba-b1e3-4935-ae6f-832589d3f3da

📥 Commits

Reviewing files that changed from the base of the PR and between ebbbec4 and e4fb020.

📒 Files selected for processing (2)
  • tensorrt_llm/models/qwen/config.py
  • tensorrt_llm/models/qwen/model.py

Comment on lines +385 to +389
elif getattr(config, 'is_embedding', False):
custom_dict = {
"q_layernorm": "q_norm",
"k_layernorm": "k_norm",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep qwen3_embedding on the Qwen3 attention path.

This new mapping assumes embedding checkpoints expose q_norm/k_norm, but QWenDecoderLayer still enables qk_layernorm only for ('qwen3', 'qwen3_moe'). If a checkpoint reports model_type == 'qwen3_embedding', the loader and model definition diverge, and config.py also still leaves attn_bias=True for that type.

Suggested fix
-        qk_layernorm = config.qwen_type in ('qwen3', 'qwen3_moe')
+        qk_layernorm = config.qwen_type in (
+            'qwen3',
+            'qwen3_moe',
+            'qwen3_embedding',
+        )

Also mirror the same qwen3_embedding addition in tensorrt_llm/models/qwen/config.py where attn_bias is derived from the Qwen3-only tuple.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/models/qwen/model.py` around lines 385 - 389, The embedding
checkpoint type 'qwen3_embedding' must be treated like the Qwen3 attention path:
update the embedding-name-to-parameter mapping block (where getattr(config,
'is_embedding', False) builds custom_dict with "q_layernorm"/"k_layernorm") and
ensure the model loader and QWenDecoderLayer recognize 'qwen3_embedding' the
same way as 'qwen3' and 'qwen3_moe'; also update the attn_bias derivation in
tensorrt_llm/models/qwen/config.py (the tuple that currently lists
('qwen3','qwen3_moe')) to include 'qwen3_embedding' so config.attn_bias,
model_type checks, and QWenDecoderLayer's qk_layernorm handling remain
consistent for that checkpoint.

Signed-off-by: jet1technology-tech <jet1technology@ryngo.in>
@Priyanshu31102003 Priyanshu31102003 force-pushed the feature-qwen3-embedding-support branch from 222d506 to c52800f Compare May 29, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants