Skip to content

Skip the zeroed-destination gather in matricized contraction#201

Merged
mtfishman merged 2 commits into
mainfrom
mf/contract-skip-output-gather
Jul 2, 2026
Merged

Skip the zeroed-destination gather in matricized contraction#201
mtfishman merged 2 commits into
mainfrom
mf/contract-skip-output-gather

Conversation

@mtfishman

@mtfishman mtfishman commented Jul 2, 2026

Copy link
Copy Markdown
Member

Summary

When β is a strong zero, contractopadd! for the Matricize algorithm no longer matricizes a_dest to form the mul! target. Matricizing the destination gathers all of its (about-to-be-overwritten) blocks into a matrix that mul! then immediately overwrites, which for a graded output is a full block-by-block copy. Instead the matmul allocates its result directly as a1_mat * a2_mat, and that is scattered into a_dest. Every coupled-sector block is materialized (the matmul zeros the ones it does not reach), so the scatter overwrites a_dest in full.

A new matricizepermaliases predicate keeps the existing zero-copy path for a dense output already in matmul-aligned order, where matricizing a_dest returns a reshape/transpose view that mul! writes straight through. That case, and any nonzero β, still take the in-place mul! path. The predicate is pure (fusion style plus matricizekind) and defaults to false, so a graded output always takes the new path and a dense aligned output never does.

This trades one redundant write pass over the output for none, with allocations unchanged. On order-4 permuting graded contractions it measures around a 10% speedup, more when the matmul is a small fraction of the work and less as the matmul dominates.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.76%. Comparing base (0890e18) to head (cf17b2e).

Files with missing lines Patch % Lines
src/matricize.jl 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #201      +/-   ##
==========================================
+ Coverage   83.74%   83.76%   +0.01%     
==========================================
  Files          24       24              
  Lines         732      739       +7     
==========================================
+ Hits          613      619       +6     
- Misses        119      120       +1     
Flag Coverage Δ
docs 25.35% <55.55%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mtfishman added 2 commits July 2, 2026 14:23
When β is a strong zero, contractopadd! for the Matricize algorithm no longer matricizes a_dest to form the mul! target. Matricizing the destination gathers all of its about-to-be-overwritten blocks into a matrix that mul! then immediately overwrites, which for a graded output is a full block-by-block copy. Instead the matmul allocates its result directly as a1_mat * a2_mat, scattered into a_dest. A new matricizepermaliases predicate keeps the existing zero-copy write-through for a dense output already in matmul-aligned order.
@mtfishman mtfishman force-pushed the mf/contract-skip-output-gather branch from 0b9ae6d to cf17b2e Compare July 2, 2026 18:25
@mtfishman mtfishman enabled auto-merge (squash) July 2, 2026 18:26
@mtfishman mtfishman merged commit ddf4dd7 into main Jul 2, 2026
21 checks passed
@mtfishman mtfishman deleted the mf/contract-skip-output-gather branch July 2, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant