Skip to content

ET path: share interaction graph + analytic pair site_grads#327

Open
jameskermode wants to merge 2 commits into
mainfrom
opt/et-graph-sharing
Open

ET path: share interaction graph + analytic pair site_grads#327
jameskermode wants to merge 2 commits into
mainfrom
opt/et-graph-sharing

Conversation

@jameskermode

@jameskermode jameskermode commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Optimises the ET (EquivariantTensors) backend force path. Found by profiling (benchmark/profile_et_forces.jl).

Stacked on #326 — base is fix/classic-force-evaluation-regression so the diff shows only the ET changes. Retarget to main once #326 merges. #326 is now merged, retarged this PR on main

Changes

1. Shared interaction graph in StackedCalculator. Each stacked component (onebody, pair, many-body) previously rebuilt its own interaction graph per call. A per-cutoff cache (gcache, keyed on each calculator's own rcut) is now threaded through the stacked energy/forces/virial/efv calls, so components that share a cutoff (pair + many-body) build the graph once. Keying on each calculator's own cutoff keeps per-component cutoffs exact (no single-graph approximation); non-WrappedSiteCalculator components fall back to the plain AtomsCalculators interface. _wrapped_* gain graph-accepting methods.

2. Analytic site_grads for ETPairModel (replaces Zygote.gradient). For the pair model site_basis_jacobian returns ∂𝔹 == ∂R directly (the pair basis is a linear sum over neighbours), so contracting it with the readout weights is exactly the per-edge gradient — no jacobian blow-up, and low coupling (uses only site_basis_jacobian/rev_reshape_embedding).

Not changed: the many-body ETACE.site_grads stays on Zygote. An analytic VJP via ET._ka_pullback was prototyped but only ~10–15% faster (ET's many-body kernel intermediates, not Zygote overhead, dominate the cost) and coupled too tightly to EquivariantTensors internals — not worth it.

Results (single-thread, Si/O)

full ET stacked forces before after
64 atoms 6.95 ms 5.06 ms (~27%)
256 atoms 24.9 ms 20.0 ms (~19%)
800 atoms 71.4 ms 61.5 ms (~14%)

ET/classic force ratio 2.31 → 2.06. ET remains ~2× slower than the classic path on CPU — its dominant cost is ET's many-body kernels, so the larger remaining win is upstream (leaner kernels / GPU), not in ACEpotentials.

Verification

  • test/et_models/test_et_calculators.jl (ET↔classic forces/virial/energy <1e-6, incl. StackedCalculator), test/etmodels/test_etace.jl, test_etpair.jl — all pass.
  • Forces bit-consistent with the classic calculator.

🤖 Generated with Claude Code

@jameskermode

Copy link
Copy Markdown
Collaborator Author

Currently only shares graphs where the cutoff is equal – which is the case for the pair and many-body graphs in the performance tests above - could also compute graph at max(rcut) and filter for the smaller cutoff ones. Open to suggestions on whether this is worth the extra book-keeping.

@jameskermode jameskermode changed the base branch from fix/classic-force-evaluation-regression to main June 29, 2026 15:25
jameskermode and others added 2 commits June 29, 2026 16:27
Two optimisations to the ET (EquivariantTensors) backend force path, found by
profiling (benchmark/profile_et_forces.jl):

1. Shared interaction graph in StackedCalculator. Previously each stacked
   component (onebody, pair, many-body) rebuilt its own interaction graph per
   force/energy call. Now a per-cutoff cache (`gcache`, keyed on each
   calculator's own `rcut`) is threaded through the stacked calls, so components
   that share a cutoff (pair + many-body) build the graph once. Keying on each
   calculator's own cutoff keeps per-component cutoffs exact — no single-graph
   approximation. Non-WrappedSiteCalculator components fall back to the plain
   AtomsCalculators interface. `_wrapped_*` gain graph-accepting methods.

2. Analytic `site_grads` for ETPairModel, replacing `Zygote.gradient`. For the
   pair model `site_basis_jacobian` returns ∂𝔹 == ∂R directly (the pair basis is
   a linear sum over neighbours), so contracting it with the readout weights is
   exactly the per-edge gradient — no jacobian blow-up, low coupling (uses only
   site_basis_jacobian / rev_reshape_embedding).

The many-body ETACE `site_grads` is left on Zygote: an analytic VJP was
prototyped but only ~10-15% faster (ET's many-body kernel intermediates, not
Zygote overhead, dominate) and coupled too tightly to ET internals.

Result (single-thread, Si/O): full ET stacked forces ~14-27% faster
(64 atoms 6.95→5.06 ms, 800 atoms 71.4→61.5 ms); ET/classic force ratio
2.31→2.06. Forces unchanged — test/et_models/test_et_calculators.jl (ET↔classic
<1e-6), test/etmodels/test_etace.jl, test_etpair.jl all pass.

Also adds benchmark/profile_et_forces.jl (ET force-path breakdown).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ETOneBody is structurally a one-body model — its energy depends only on atom
species (node_data), `site_grads` returns empty edge gradients unconditionally,
and forces/virial are identically zero. Building an interaction graph for it (the
`convert2et_full` onebody used rcut=3.0) ran a neighbour search whose edges were
then discarded.

Specialise the ETOneBodyPotential energy/forces/virial (and the StackedCalculator
`_cached_*` dispatch) to build the node states directly and skip the graph
entirely. node_data is rcut-independent, so the result is identical to the graph
path. Also drop the now-meaningless rcut=3.0 in convert2et_full (→ 0.0, unused).

Verified: test/et_models/test_et_calculators.jl passes (onebody energy + stacked
energy/forces/virial unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jameskermode jameskermode force-pushed the opt/et-graph-sharing branch from dd03745 to 7288ed1 Compare June 29, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant