Skip to content

Add MXFP8 attention#2719

Open
cyanguwa wants to merge 234 commits intoNVIDIA:mainfrom
cyanguwa:add_mxfp8
Open

Add MXFP8 attention#2719
cyanguwa wants to merge 234 commits intoNVIDIA:mainfrom
cyanguwa:add_mxfp8

Conversation

@cyanguwa
Copy link
Copy Markdown
Collaborator

@cyanguwa cyanguwa commented Mar 1, 2026

Description

  1. Added MXFP8 support in FusedAttention backend (fwd+bwd)
  2. Decoupled input/output format in APIs by introducing o_format, do_format, and dqkv_layout
  3. Implemented mxfp8_quantize_fast_path() to quantize multiple tensors, pad/permute/swizzle the scale_invs for a more efficient quantization pipeline; added qkv_scale_inv_format, do_scale_inv_format to indicate scale_invs' format
  4. Implemented multi_tensor_transpose_to_bhsd() to permute tensors from BSHD/SBHD to BHSD, with the TMA path optimized for FP16/BF16 dtype, D=192/128 cases, fallback_vec_aligned path for Byte, D=8/4, and a fallback_non_vec_aligned path for Byte, D=6
  5. Implemented multi_tensor_pad_last_dim() to pad multiple tensors' D to %4 for rowwise and %128 for columnwise
  6. Implemented multi_tensor_swizzle_row_scaling_narrow_k_kernel and multi_tensor_swizzle_col_scaling_narrow_m_kernel to optimize for small K/M dimensions
  7. Added MXFP8PaddedSizes, pad_s_d_for_mxfp8, generateMatrixStridesWithFormat, generateMatrixStridesWithLayout for size/stride computation
  8. Fixed bug in O/dQKV shape logic for MLA; added utility nvte_convert_qkv_shape
  9. Refactored CP and added support for MXFP8 in cp_comm_type={'a2a', 'p2p', 'a2a+p2p', 'all_gather'}, with {'a2a', 'p2p', 'a2a+p2p', 'all_gather'} for MHA/GQA/MQA/MLA, {'a2a', 'all_gather'} for SWA, and {'a2a'} for sink attention
  10. Fixed scale_inv_offsets calculation in GroupedTensor
  11. MXFP8 attention requires cudnn-frontend v1.21+ and cuDNN 9.21+

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please see Description.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
This reverts commit d9ff566.

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Comment thread transformer_engine/common/util/ptx.cuh Outdated
Comment thread transformer_engine/pytorch/csrc/extensions/attention.cpp Outdated
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
This reverts commit 3b854b2.
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Comment thread transformer_engine/common/fused_attn/fused_attn.cpp
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci L1

@cyanguwa cyanguwa requested review from ptrendx and timmoon10 April 17, 2026 04:01
cyanguwa and others added 6 commits April 17, 2026 11:38
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci L1

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants