[https://nvbugs/6240561][fix] Autodeploy fix the deepseek accuracy drop by nvchenghaoz · Pull Request #14774 · NVIDIA/TensorRT-LLM

nvchenghaoz · 2026-05-30T02:55:50Z

Summary by CodeRabbit

Tests
- Improved validation of rotary embeddings with enhanced test assertions.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

coderabbitai · 2026-05-30T03:00:17Z

📝 Walkthrough

Walkthrough

RoPE inverse-frequency and YaRN interpolation computation in the config fallback path is refactored: inverse frequencies now derive from full-dimension indices, YaRN scaling denominator computation is unified, and interpolation uses explicit mask-based blending between frequency terms with test validation added.

Changes

RoPE YaRN Computation Update

Layer / File(s)	Summary
RoPE inverse frequency and YaRN scaling setup `tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_mla.py`	Inverse frequency is computed from even-dimension indices across full `qk_rope_head_dim`; YaRN denominator calculation always invokes `_yarn_get_mscale()` without conditional fallback.
YaRN interpolation mask-based blending `tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_mla.py`, `tests/unittest/auto_deploy/singlegpu/models/test_deepseek_custom.py`	YaRN interpolation replaces smooth-step with `inv_freq_mask`-based explicit blending between `freq_inter` and `freq_extra`; test adds exact-match assertion before approximate comparison.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

govind-ramnarayan
bmarimuthu-nv
MrGeva

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description contains only the template with placeholders unfilled; no actual description, test coverage details, or explanation of changes is provided.	Fill in the Description section explaining the RoPE/YaRN computation changes and their purpose, add Test Coverage section listing relevant tests, and ensure the PR title follows the format with an appropriate ticket/issue reference.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title references the NVBugs ticket and fix type correctly, and specifically mentions 'deepseek accuracy drop' which aligns with the code changes targeting RoPE/YaRN rotary embedding fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unittest/auto_deploy/singlegpu/models/test_deepseek_custom.py`:
- Around line 192-194: The test currently uses torch.equal(actual, expected)
which enforces bitwise equality between actual (from
_compute_rotary_cos_sin_from_config(Factory()).cpu()) and expected (CPU),
causing cross-backend flakiness; remove that torch.equal check and rely only on
the tolerance-based assertion torch.testing.assert_close(actual, expected,
atol=3e-7, rtol=1e-4), or alternatively compute expected on the same backend as
actual by calling .to(actual.device) before comparing if you need strict
regression. Ensure changes target the assertions around
_compute_rotary_cos_sin_from_config, Factory, actual, and expected in this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 229e0bf0-ee11-4c7c-8b56-4765f60520da

📥 Commits

Reviewing files that changed from the base of the PR and between 74d7c3a and 3f40f8e.

📒 Files selected for processing (2)

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_mla.py
tests/unittest/auto_deploy/singlegpu/models/test_deepseek_custom.py

taylor-yb-lee · 2026-05-30T03:16:35Z

Issue link : https://jirasw.nvidia.com/browse/TRTLLM-13054

nvchenghaoz · 2026-05-30T03:49:54Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --disable-fail-fast

tensorrt-cicd · 2026-05-30T03:55:31Z

PR_Github #51151 [ run ] triggered by Bot. Commit: 3f40f8e Link to invocation

tensorrt-cicd · 2026-05-30T09:00:45Z

PR_Github #51151 [ run ] completed with state SUCCESS. Commit: 3f40f8e
/LLM/main/L0_MergeRequest_PR pipeline #40585 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

fix the deepseek accuracy drop

3f40f8e

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

nvchenghaoz requested a review from a team as a code owner May 30, 2026 02:55

nvchenghaoz requested a review from MrGeva May 30, 2026 02:55

github-actions Bot assigned nvchenghaoz May 30, 2026

nvchenghaoz requested review from suyoggupta and taylor-yb-lee May 30, 2026 02:56

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread tests/unittest/auto_deploy/singlegpu/models/test_deepseek_custom.py

nvchenghaoz changed the title ~~[None][fix] Autodeploy fix the deepseek accuracy drop~~ [https://nvbugs/6240561][fix] Autodeploy fix the deepseek accuracy drop May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6240561][fix] Autodeploy fix the deepseek accuracy drop#14774

[https://nvbugs/6240561][fix] Autodeploy fix the deepseek accuracy drop#14774
nvchenghaoz wants to merge 1 commit into
NVIDIA:mainfrom
nv-auto-deploy:chenghao/rope_0529

nvchenghaoz commented May 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

taylor-yb-lee commented May 30, 2026

Uh oh!

nvchenghaoz commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvchenghaoz commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taylor-yb-lee commented May 30, 2026

Uh oh!

nvchenghaoz commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvchenghaoz commented May 30, 2026 •

edited

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading