Skip to content

[Docs]: release/v1.21.6 doc update#1007

Open
abukhoy wants to merge 1 commit into
quic:release/v1.21.6from
abukhoy:doc-update-v1.21.6
Open

[Docs]: release/v1.21.6 doc update#1007
abukhoy wants to merge 1 commit into
quic:release/v1.21.6from
abukhoy:doc-update-v1.21.6

Conversation

@abukhoy
Copy link
Copy Markdown
Contributor

@abukhoy abukhoy commented May 25, 2026

This Pr is created for updating the release docs of the release branch release/v1.21.6.

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
# Efficient Transformer Library - 1.21.6 Release Notes

Welcome to the official release of **Efficient Transformer Library v1.21.6**! This targeted release builds on the v1.21 line with multi-resolution Vision Language Model workflows, Qwen3-VL stability fixes, on-device sampling enablement, and compatibility updates for newer model and framework APIs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add online serving support for Gemma4 through vLLM

## Key Features & Enhancements

- **Multi-specialization vision compilation for Qwen VLMs**
- Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE can compile multiple vision resolution and frame configurations in one pass.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WE should remove QWE3-VL MOE model from this list, as this model is not tested by SIT in this release. We should only keep the models which are vetted by SIT.

- Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE can compile multiple vision resolution and frame configurations in one pass.
- `height`, `width`, and `num_frames` can be supplied as lists when building specializations.
- Runtime generation can select the matching specialization through the multi-frame generation path.
- New example scripts are available for [Qwen2.5-VL](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen2_5_vl), [Qwen3-VL Dense](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3vl), and [Qwen3-VL-MoE](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3_vl_moe).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove qwen3-vl moe example script also. Only keep Qwen2.5VL and QWEN3-VL dense models

- Adds regression coverage for large embedding and reranker model export flows.

- **Qwen VLM runtime stability**
- Fixes RoPE handling for Qwen3-VL-MoE disaggregated mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line


- **Gemma3 configuration compatibility**
- Updates Gemma3 cache handling for the newer `_sliding_window_pattern` config field.
- Preserves sliding-window behavior for Gemma3 models using updated Transformers configs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add online serving support for Gemma3 through vLLM is added

- Accepts `vision_feature_layer` and `vision_feature_select_strategy` forwarded by newer Transformers Llama4 APIs.
- Fixes ONNX export failures for Llama4 vision models while remaining backward compatible.

---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add GPT OSS 120B with BS>1 and GPT OSS 20B BS>2 support is enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants