Skip to content

[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838

Open
zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
zmelumian972:cudnn/support_version_1.22
Open

[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838
zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
zmelumian972:cudnn/support_version_1.22

Conversation

@zmelumian972
Copy link
Copy Markdown

@zmelumian972 zmelumian972 commented Apr 5, 2026

Summary

  • Updates the cudnn-frontend submodule to version 1.22 (97f6cb3b)
  • Adds a new cuDNN 9.20 path in nvte_get_fused_attn_backend for Blackwell (SM arch >= 100) that supports any head dimension, both forward and backward passes, non-paged layouts, and any sequence length

Changes

  • 3rdparty/cudnn-frontend: Bump submodule from 7b9b711c to 97f6cb3b (cuDNN frontend v1.22)
  • transformer_engine/common/fused_attn/fused_attn.cpp: Add cuDNN 9.20 backend selection condition:
    • Enables FusedAttn_F16_Arbitrary_Seqlen backend for SM >= 100 + cuDNN >= 9.20 + non-paged KV layouts
    • Fixes the logical operator joining the 9.11 condition from && to || to correctly OR the two Blackwell conditions

Test plan

  • Verify FusedAttention with cuDNN 9.20+ on Blackwell (SM >= 100) hardware
  • Confirm existing Hopper (SM 90) paths are unaffected
  • Run fused attention unit tests for paged/non-paged layouts

🤖 Generated with Claude Code

@zmelumian972 zmelumian972 force-pushed the cudnn/support_version_1.22 branch 2 times, most recently from d72d8a2 to dcef948 Compare April 5, 2026 16:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 5, 2026

Greptile Summary

Bumps cudnn-frontend to v1.22 and adds a cuDNN 9.20 head-dimension OR branch for Blackwell (SM≥100, non-paged, fprop+bprop, any sq) inside the existing flag_arb check in nvte_get_fused_attn_backend. The parenthesis accounting is correct: the trailing && on the 9.11 clause is correctly changed to || to insert the new sub-condition inside the head-dim OR group, and all outer gates (architecture, mask type, QKV format, sliding window, determinism) continue to apply to the 9.20 path unchanged.

Confidence Score: 5/5

Safe to merge — the logic change is structurally correct and all pre-existing runtime gates still apply to the new 9.20 path.

No new P0/P1 findings. The &&|| restructuring is correct: it moves the closing ) of the 9.11 sub-condition one level inward and appends the 9.20 clause as a sibling OR branch before the outer group closes, keeping the bug-exclusion && in the right place. The 9.20 condition inherits all surrounding guards (SM≥100 arch check, mask-type block, QKV-format block, Blackwell determinism check), so no previously-gated combinations are accidentally opened beyond what cuDNN 9.20 is expected to support. The one open question about sq=1 + causal on Blackwell/9.20 was already raised in a prior review thread.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/common/fused_attn/fused_attn.cpp Adds cuDNN 9.20 head-dim OR branch for Blackwell (SM≥100, fprop+bprop, non-paged); changes trailing && to `
3rdparty/cudnn-frontend Submodule bump from 7b9b711c to 97f6cb3b (cudnn-frontend v1.22); no source changes in this repo.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["F16/BF16 path entered"] --> B{"arch check\nSM≥100 requires\ncuDNN≥9.7"}
    B -->|pass| C{"head_dim % 8 == 0\nfor qk and v?"}
    B -->|fail| Z["flag_arb = false"]
    C -->|yes| D{"Which head_dim\ncondition matches?"}
    C -->|no| Z
    D --> D1["≤128 (any version)"]
    D --> D2["≤256 + Hopper\n(9.1 fprop / 9.5 bprop)"]
    D --> D3["any dim + Blackwell\n+ fprop + non-paged\n+ sq>1  (9.9)"]
    D --> D4["any dim + fprop\n+ any arch (9.10.2)"]
    D --> D5["dqk=192/dv=128\n+ Blackwell + bprop\n+ non-paged (9.11)"]
    D --> D6["any dim + Blackwell\n+ fprop/bprop\n+ non-paged (9.20 NEW)"]
    D1 & D2 & D3 & D4 & D5 & D6 --> E{"Hopper bprop\nbug exclusion\n(sm==90 only)"}
    E -->|not blocked| F{"mask type,\nformat, SWA,\ndeterminism\ngates"}
    E -->|blocked| Z
    F -->|all pass| G["flag_arb = true\n→ NVTE_F16_arbitrary_seqlen"]
    F -->|any fail| Z
Loading

Reviews (2): Last reviewed commit: "FusedAttention: Add cudnn 9.20 path for ..." | Re-trigger Greptile

Comment on lines +343 to +345
// 9.20: any head_dim + Blackwell + fprop/bprop + non_paged + any sq
(sm_arch_ >= 100 && cudnn_runtime_version >= 92000 &&
layout_group != NVTE_QKV_Layout_Group::NVTE_Paged_KV_HD_HD_HD)) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Verify sq=1 + causal/padding_causal fprop support in cuDNN 9.20

The 9.20 condition allows any max_seqlen_q (including sq = 1) with any mask type on non-paged Blackwell layouts. The preceding 9.10.2 fprop path explicitly excluded sq = 1 + causal and sq = 1 + padding_causal on non-paged layouts:

(max_seqlen_q == 1 && attn_mask_type != NVTE_Mask_Type::NVTE_CAUSAL_MASK &&
 attn_mask_type != NVTE_Mask_Type::NVTE_PADDING_CAUSAL_MASK)

With the 9.20 path (any sq, no mask-type restriction at the head-dim level), sq=1 + causal + non-paged + fprop on Blackwell/cuDNN≥9.20 will now pass this gate — where it was previously blocked. If cuDNN 9.20 lifts this restriction for SM≥100, this is correct. If not, passing this combination to the backend would produce a runtime error. Please confirm whether cuDNN 9.20 actually supports this combination on Blackwell.

@KshitijLakhani
Copy link
Copy Markdown
Collaborator

/te-ci jax L0

@jberchtold-nvidia jberchtold-nvidia self-assigned this Apr 8, 2026
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
@zmelumian972 zmelumian972 force-pushed the cudnn/support_version_1.22 branch from dcef948 to d217bf9 Compare April 16, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants