Skip to content

gfx1250 mxfp8 gemm: loosen restrictions on K#627

Open
matthiasdiener wants to merge 2 commits into
devfrom
mdiener/loosen-gfx1250-mxfp8-k-restriction
Open

gfx1250 mxfp8 gemm: loosen restrictions on K#627
matthiasdiener wants to merge 2 commits into
devfrom
mdiener/loosen-gfx1250-mxfp8-k-restriction

Conversation

@matthiasdiener

@matthiasdiener matthiasdiener commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

Loosen restrictions on K on gfx1250 mxfp8 gemm (K must be multiple of 32), confirmed with hipblaslt developers.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@matthiasdiener matthiasdiener self-assigned this Jun 16, 2026
@matthiasdiener matthiasdiener added the ci-level 1 CI test level 1 label Jun 16, 2026
@matthiasdiener matthiasdiener changed the title gfx1250 gemm: loosen restrictions on K gfx1250 mxfp8 gemm: loosen restrictions on K Jun 16, 2026
// Check that K is compatible with the MXFP8 scale layout, and M/N are multiples of 16
if (inputA->scaling_mode == NVTE_MXFP8_1D_SCALING || inputB->scaling_mode == NVTE_MXFP8_1D_SCALING) {
const bool is_gfx1250 = cuda::sm_arch() == 125;
const int required_k_multiple = is_gfx1250 ? 32 : 128;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO here to change this for gfx950 after scale preswizzle is in hipblasLt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 3a7dd8f

}
if (params.k % 128) {
GTEST_SKIP() << "MXFP8 requires K to be a multiple of 128";
const size_t required_k_multiple = (prop.major == 12 && prop.minor == 5) ? 32 : 128;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be under HIP_PLATFORM_AMD below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of 3a7dd8f? Or do you prefer to move the whole if (use_mxfp8) part into the #ifdef __HIP_PLATFORM_AMD__ below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 1 CI test level 1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants