Skip to content

[TRTLLM-12669][refactor] Remove allow_advanced_sampling and capture dual CUDA graphs#14745

Open
zhaoyangwang-nvidia wants to merge 1 commit into
NVIDIA:mainfrom
zhaoyangwang-nvidia:TRTLLM-12669-remove-allow-advanced-sampling
Open

[TRTLLM-12669][refactor] Remove allow_advanced_sampling and capture dual CUDA graphs#14745
zhaoyangwang-nvidia wants to merge 1 commit into
NVIDIA:mainfrom
zhaoyangwang-nvidia:TRTLLM-12669-remove-allow-advanced-sampling

Conversation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator

@zhaoyangwang-nvidia zhaoyangwang-nvidia commented May 29, 2026

Replace static config flag with auto-detected per-step uses_advanced_sampling based on actual sampling params. Include this in CUDA graph key so we lazily capture two graph variants (argmax fast-path vs advanced sampling kernel) and dispatch by replaying the right one.

@coderabbitai summary

Description

  • Removed allow_advanced_sampling config flag from DecodingBaseConfig.
  • Replaced with auto-detected per-step is_all_greedy_sample based on
    actual temperature/top_k/top_p of requests in the batch.
  • Included this in the CUDA graph key so two graph variants are lazily
    captured (argmax fast-path vs advanced sampling kernel) and dispatched
    at replay time based on batch composition.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…tected dual-graph dispatch

Remove the static `allow_advanced_sampling` config flag and replace it
with a per-step auto-detected `is_all_greedy_sample` boolean on
SpecMetadata. The flag is computed in `populate_sampling_params_for_one_model`
from the actual temperature/top_k/top_p of every request in the batch.

`is_all_greedy_sample` is included in the CUDA graph key so we lazily
capture two graph variants (argmax fast-path vs advanced sampling
kernel) and dispatch by replaying the right one based on the current
batch composition. Both variants stay CUDA-graph-compatible because the
dispatch is a host-side decision outside the captured region.

Additional optimizations for the all-greedy batch (the common default):
- Populate skips per-token list building and 6 H->D copies entirely.
- Rejection sampling is bypassed (argmax is equivalent for all-greedy)
  in both linear and dynamic-tree paths.
- _compute_and_store_draft_probs is skipped, saving a softmax pass and
  draft-probs copy.

Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>
@zhaoyangwang-nvidia zhaoyangwang-nvidia force-pushed the TRTLLM-12669-remove-allow-advanced-sampling branch from 903b453 to d237690 Compare May 29, 2026 10:05
@zhaoyangwang-nvidia zhaoyangwang-nvidia marked this pull request as ready for review May 29, 2026 10:10
@zhaoyangwang-nvidia zhaoyangwang-nvidia requested review from a team as code owners May 29, 2026 10:10
@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

Hi @mikeiovine please help to review this PR, thanks~

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51043 [ run ] triggered by Bot. Commit: d237690 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51043 [ run ] completed with state SUCCESS. Commit: d237690
/LLM/main/L0_MergeRequest_PR pipeline #40490 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants