[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838
[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
d72d8a2 to
dcef948
Compare
Greptile SummaryBumps Confidence Score: 5/5Safe to merge — the logic change is structurally correct and all pre-existing runtime gates still apply to the new 9.20 path. No new P0/P1 findings. The No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["F16/BF16 path entered"] --> B{"arch check\nSM≥100 requires\ncuDNN≥9.7"}
B -->|pass| C{"head_dim % 8 == 0\nfor qk and v?"}
B -->|fail| Z["flag_arb = false"]
C -->|yes| D{"Which head_dim\ncondition matches?"}
C -->|no| Z
D --> D1["≤128 (any version)"]
D --> D2["≤256 + Hopper\n(9.1 fprop / 9.5 bprop)"]
D --> D3["any dim + Blackwell\n+ fprop + non-paged\n+ sq>1 (9.9)"]
D --> D4["any dim + fprop\n+ any arch (9.10.2)"]
D --> D5["dqk=192/dv=128\n+ Blackwell + bprop\n+ non-paged (9.11)"]
D --> D6["any dim + Blackwell\n+ fprop/bprop\n+ non-paged (9.20 NEW)"]
D1 & D2 & D3 & D4 & D5 & D6 --> E{"Hopper bprop\nbug exclusion\n(sm==90 only)"}
E -->|not blocked| F{"mask type,\nformat, SWA,\ndeterminism\ngates"}
E -->|blocked| Z
F -->|all pass| G["flag_arb = true\n→ NVTE_F16_arbitrary_seqlen"]
F -->|any fail| Z
Reviews (2): Last reviewed commit: "FusedAttention: Add cudnn 9.20 path for ..." | Re-trigger Greptile |
| // 9.20: any head_dim + Blackwell + fprop/bprop + non_paged + any sq | ||
| (sm_arch_ >= 100 && cudnn_runtime_version >= 92000 && | ||
| layout_group != NVTE_QKV_Layout_Group::NVTE_Paged_KV_HD_HD_HD)) && |
There was a problem hiding this comment.
Verify
sq=1 + causal/padding_causal fprop support in cuDNN 9.20
The 9.20 condition allows any max_seqlen_q (including sq = 1) with any mask type on non-paged Blackwell layouts. The preceding 9.10.2 fprop path explicitly excluded sq = 1 + causal and sq = 1 + padding_causal on non-paged layouts:
(max_seqlen_q == 1 && attn_mask_type != NVTE_Mask_Type::NVTE_CAUSAL_MASK &&
attn_mask_type != NVTE_Mask_Type::NVTE_PADDING_CAUSAL_MASK)With the 9.20 path (any sq, no mask-type restriction at the head-dim level), sq=1 + causal + non-paged + fprop on Blackwell/cuDNN≥9.20 will now pass this gate — where it was previously blocked. If cuDNN 9.20 lifts this restriction for SM≥100, this is correct. If not, passing this combination to the backend would produce a runtime error. Please confirm whether cuDNN 9.20 actually supports this combination on Blackwell.
|
/te-ci jax L0 |
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
dcef948 to
d217bf9
Compare
Summary
cudnn-frontendsubmodule to version 1.22 (97f6cb3b)nvte_get_fused_attn_backendfor Blackwell (SM arch >= 100) that supports any head dimension, both forward and backward passes, non-paged layouts, and any sequence lengthChanges
3rdparty/cudnn-frontend: Bump submodule from7b9b711cto97f6cb3b(cuDNN frontend v1.22)transformer_engine/common/fused_attn/fused_attn.cpp: Add cuDNN 9.20 backend selection condition:FusedAttn_F16_Arbitrary_Seqlenbackend for SM >= 100 + cuDNN >= 9.20 + non-paged KV layouts&&to||to correctly OR the two Blackwell conditionsTest plan
🤖 Generated with Claude Code