Skip to content

Metal backend: Add gather_qmv kernel for MoE expert-indexed quantized matmul#18877

Open
manuelcandales wants to merge 3 commits intogh/manuelcandales/171/headfrom
gh/manuelcandales/172/head
Open

Metal backend: Add gather_qmv kernel for MoE expert-indexed quantized matmul#18877
manuelcandales wants to merge 3 commits intogh/manuelcandales/171/headfrom
gh/manuelcandales/172/head

Conversation

@manuelcandales
Copy link
Copy Markdown
Contributor

@manuelcandales manuelcandales commented Apr 14, 2026

Adds gather_qmv Metal kernel for Mixture-of-Experts: performs per-expert
quantized matrix-vector multiply y[i] = W[expert_idx[i]] @ x[i]. Extends
the existing qmv kernels in op_linear_4bit.mm with expert
index-based pointer offsets, following the same pattern as MLX's
affine_gather_qmv_fast.

Two dispatch paths (matching op_linear_4bit.mm):

  • gather_qmv_fast: optimized path for K%512==0 and N%8==0
  • gather_qmv_impl: generic fallback for any K and N

Uses the same affine INT4 dequantization format as op_linear_4bit.mm
(scale * accum + sum * bias). Instantiated for 4-bit with group sizes
{32, 64, 128} and dtypes {float, bfloat16}.

Includes: Metal shader + C++ host dispatch, Python custom op definition
(metal::gather_qmv) with reference CPU impl and Meta impl, C shim dict,
fallback kernel registration, CMakeLists entry, and test module.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 14, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18877

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 15 Cancelled Jobs, 8 Pending

As of commit 59f88db with merge base 5707e2a (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
@manuelcandales manuelcandales marked this pull request as ready for review April 14, 2026 22:24
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant