Speed up named-tensor broadcasting by mtfishman · Pull Request #199 · ITensor/ITensorBase.jl

mtfishman · 2026-07-01T17:57:05Z

Summary

Reworks elementwise broadcasting on named tensors (a + b, a .+ b, c .= a .+ b) to skip Base's named-axis broadcast machinery and lower directly onto TensorAlgebra's linear-combination path over the backing arrays, aligning operands by dimension name.

A named broadcast previously rebuilt NamedUnitRange axes through combine_axes/broadcast_shape on every call and ran a runtime promote_op inference call for the result element type, because a named tensor's eltype is not inferrable. A 2x2 add allocated 52 times. Now instantiate is a no-op for the named style, the destination names come from dimnames of the operands rather than axes(bc), and the unnamed work runs behind a function barrier where the result element type is inferrable. The same 2x2 add drops to 10 allocations (3.19 KiB to 464 B) and is several times faster, and larger dense adds improve as well.

materialize! is intercepted for the named style so the in-place path no longer reconstructs the broadcast over axes(dest) and re-enters the axis machinery. In-place is now cheaper than out-of-place, as it should be. An operand already aligned to the destination names takes a fast path that returns its backing array untouched, and an operand whose dimension order differs is aligned behind a function barrier that builds the permutation with a static length, dropping the previous aligneddims round trip and keeping the permuted case fast.

With the axis machinery off every path, the combine_axes/broadcast_shape/promote_shape/check_broadcast_shape overloads and the broadcasted similar are removed. axes and similar on a raw lazy named Broadcasted are no longer supported, which nothing relies on: materialization goes through dimnames, and the non-linear fused fallback runs on the unnamed broadcast.

## Summary Reworks elementwise broadcasting on named tensors (`a + b`, `a .+ b`, `c .= a .+ b`) to skip Base's named-axis broadcast machinery and lower directly onto TensorAlgebra's linear-combination path over the backing arrays, aligning operands by dimension name. A named broadcast previously rebuilt `NamedUnitRange` axes through `combine_axes`/`broadcast_shape` on every call and ran a runtime `promote_op` inference call for the result element type, because a named tensor's `eltype` is not inferrable. A 2x2 add allocated 52 times. Now `instantiate` is a no-op for the named style, the destination names come from `dimnames` of the operands rather than `axes(bc)`, and the unnamed work runs behind a function barrier where the result element type is inferrable. The same 2x2 add drops to 10 allocations (3.19 KiB to 464 B) and is several times faster, and larger dense adds improve as well. `materialize!` is intercepted for the named style so the in-place path no longer reconstructs the broadcast over `axes(dest)` and re-enters the axis machinery. In-place is now cheaper than out-of-place, as it should be. Aligning an operand to the destination names also drops the previous `aligneddims` round trip. With the axis machinery off every path, the `combine_axes`/`broadcast_shape`/`promote_shape`/`check_broadcast_shape` overloads and the broadcasted `similar` are removed. `axes` and `similar` on a raw lazy named `Broadcasted` are no longer supported, which nothing relies on: materialization goes through `dimnames`, and the non-linear fused fallback runs on the unnamed broadcast.

codecov · 2026-07-01T17:59:54Z

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.00%. Comparing base (df02533) to head (2061c94).

Files with missing lines	Patch %	Lines
src/broadcast.jl	86.20%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #199      +/-   ##
==========================================
+ Coverage   73.44%   74.00%   +0.55%     
==========================================
  Files          28       28              
  Lines        1529     1500      -29     
==========================================
- Hits         1123     1110      -13     
+ Misses        406      390      -16

Flag	Coverage Δ
docs	`24.55% <63.63%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Keep `unnamed(a, names)` a general permute-to-names method and handle the already-aligned shortcut in `broadcasted_unnamed`, its only caller. The shortcut is load-bearing: without it every aligned operand (including the always-aligned first one) is wrapped in an identity `PermutedDimsArray`, which breaks the clean lowering and makes a small add several times slower.

Behind a function barrier that recovers the concrete backing array, `ndims` is a compile-time constant, so the permutation can be built as an `ntuple(..., Val(ndims))` (an `NTuple{N,Int}`) rather than `Tuple(getperm(...))` (a `Tuple{Vararg{Int}}` whose length is not inferrable). That lets `permuteddims` build a concretely-typed wrapper, roughly halving the cost of a permuted add (a 2x2 permuted add goes from about 1130 ns to about 900 ns). Aligned adds are unaffected, taking the alignment fast path in `broadcasted_unnamed`.

mtfishman added 2 commits July 1, 2026 14:13

mtfishman merged commit 94614ee into main Jul 1, 2026
18 checks passed

mtfishman deleted the mf/named-broadcast-instantiate branch July 1, 2026 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up named-tensor broadcasting#199

Speed up named-tensor broadcasting#199
mtfishman merged 3 commits into
mainfrom
mf/named-broadcast-instantiate

mtfishman commented Jul 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mtfishman commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

codecov Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mtfishman commented Jul 1, 2026 •

edited

Loading

codecov Bot commented Jul 1, 2026 •

edited

Loading