Metal backend: Add gated delta rule kernel for linear attention by manuelcandales · Pull Request #18878 · pytorch/executorch

manuelcandales · 2026-04-14T16:25:34Z

Adds Metal kernel for the gated delta rule recurrence used by Qwen 3.5
MoE's GatedDeltaNet linear attention layers. Ported from the MLX delegate
PR (#18785) Metal shader. The kernel processes the full sequence
sequentially within a single GPU dispatch, keeping recurrent state in
per-thread registers.

Grid: [32, Dv, B*Hv], Threadgroup: [32, 4, 1]. Each simdgroup of 32
threads handles Dk/32 elements of the key dimension with SIMD reduction
for dot products.

The op mutates the recurrent state buffer in-place (mutates_args).
Instantiated for both real model (Dk=128, Dv=128, Hk=32, Hv=32) and
tiny test (Dk=64, Dv=64, Hk=4, Hv=4) dimensions.

Includes: Metal shader + C++ host dispatch, Python custom op definition
(metal::gated_delta_rule) with reference CPU impl and Meta impl, C shim
dict, fallback kernel registration, CMakeLists entry, and test module.

Authored with Claude.

[ghstack-poisoned]

manuelcandales · 2026-04-14T16:25:35Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-14T16:26:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18878

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

⏳ No Failures, 147 Pending

As of commit 60ca500 with merge base 5707e2a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Update

805a09d

[ghstack-poisoned]

manuelcandales requested review from kirklandsign, larryliu0820 and shoumikhin as code owners April 14, 2026 16:25

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026

manuelcandales marked this pull request as draft April 14, 2026 16:27

Update

eba74c4

[ghstack-poisoned]

manuelcandales marked this pull request as ready for review April 14, 2026 22:24

Update

60ca500

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal backend: Add gated delta rule kernel for linear attention#18878

Metal backend: Add gated delta rule kernel for linear attention#18878
manuelcandales wants to merge 3 commits intogh/manuelcandales/172/headfrom
gh/manuelcandales/173/head

manuelcandales commented Apr 14, 2026

Uh oh!

manuelcandales commented Apr 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

manuelcandales commented Apr 14, 2026

Uh oh!

manuelcandales commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18878

❗ 1 Active SEVs

⏳ No Failures, 147 Pending

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

manuelcandales commented Apr 14, 2026 •

edited

Loading

pytorch-bot bot commented Apr 14, 2026 •

edited

Loading