Fix ~13x force-evaluation regression in classic ACEModel by jameskermode · Pull Request #326 · ACEsuit/ACEpotentials.jl

jameskermode · 2026-06-29T12:30:09Z

Summary

The default ace1_model / classic ACEModel force evaluation regressed ~13× in the v0.10 (EquivariantTensors) series versus the previous release (0.9.1). This PR fixes it and adds regression tooling.

Root cause

A type instability in evaluate_ed (src/models/ace.jl). EquivariantTensors.pullback is not type-stable (returns Tuple{Any,Any}), so ∂Rnl/∂Ylm reached the inline gradient-assembly loops as Any, making every element-wise product dynamically dispatched. The numerical kernels were ~30 µs, but the call took ~400 µs and allocated ~764 KB/site — ~80% of the time was in the assembly loops.

The originally-suspected cause (ET autograd forces) was refuted by benchmark: the ET path is not the regression; in fact the broken classic path was slower than ET.

Fix

Move the gradient assembly into a function barrier (_assemble_grad_ed!) so Julia re-specialises on the concrete runtime types.

	before	after
`evaluate_ed` (40 neighbours)	~386 µs	~30 µs
classic forces, 512 atoms	190.8 ms	18.3 ms
cross-version vs 0.9.1 (512 atoms)	~13× slower	~1.15×

Forces are bit-identical (max|Δg| ≈ 1e-15).

Regression tooling (prevent recurrence)

benchmark/common.jl — shared model/system setup
benchmark/bench_forces_regression.jl — classic vs ET forces/energy (time + allocations)
benchmark/bench_crossversion.jl (+ worker) — release-vs-release comparison in isolated envs
benchmark/profile_classic_ed.jl — evaluate_ed component profile
benchmark/benchmarks.jl — PkgBenchmark SUITE
.github/workflows/Benchmark.yml — non-blocking BenchmarkCI judge (PR vs base on same runner, threads pinned)
benchmark/README.md, benchmark/FORCE_REGRESSION_FINDINGS.md

Verification

test/models/test_ace.jl and test/models/test_calculator.jl pass (finite-difference force checks, Zygote-through-loss).

Follow-up (not in this PR)

Optimisations to the ET path (analytic site_grads to remove 12–154 MB/eval Zygote allocations; shared interaction graph across stacked calculators) — to be planned separately.

🤖 Generated with Claude Code

The default `ace1_model` / classic `ACEModel` force evaluation regressed ~13x in the v0.10 (EquivariantTensors) series versus 0.9.1. Root cause: a type instability in `evaluate_ed` (src/models/ace.jl). `EquivariantTensors.pullback` is not type-stable (returns Tuple{Any,Any}), so `∂Rnl` / `∂Ylm` reached the inline gradient-assembly loops as `Any`, making every element-wise product dynamically dispatched. The numerical kernels were ~30us but the call took ~400us and allocated ~764 KB/site, with ~80% of the time spent in the assembly loops. Fix: move the assembly into a function barrier (`_assemble_grad_ed!`) so Julia re-specialises on the concrete runtime types. evaluate_ed drops from ~386us to ~30us for 40 neighbours; forces are bit-identical (max|Δg| ≈ 1e-15). Cross-version regression vs 0.9.1 collapses from ~13x to ~1.15x. Note: the originally-suspected cause (ET autograd forces) was refuted by benchmark — the ET path is not the regression. Also adds performance regression tooling under benchmark/ to prevent recurrence: - common.jl: shared model/system setup - bench_forces_regression.jl: classic vs ET forces/energy (time + allocs) - bench_crossversion.jl (+ worker): release-vs-release comparison - profile_classic_ed.jl: evaluate_ed component profile - benchmarks.jl: PkgBenchmark suite - .github/workflows/Benchmark.yml: non-blocking BenchmarkCI judge (PR vs base) - README.md, FORCE_REGRESSION_FINDINGS.md Verified: test/models/test_ace.jl and test/models/test_calculator.jl pass (finite-difference force checks, Zygote-through-loss). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Benchmark job failed at `Pkg.add("BenchmarkCI")` — the workflow only added the ACEregistry, so the General registry was absent and BenchmarkCI could not be resolved. Also, BenchmarkCI's judge-against-base cannot work on this PR: the benchmark suite does not exist on `main` yet, so there is no baseline to compare to. Switch to a robust run-and-report approach: - add both the General and ACE registries - run the PkgBenchmark suite via `benchmarkpkg(".")` (verified locally) - post results to the job summary - keep it non-blocking, threads pinned Base-branch `judge` (e.g. via BenchmarkCI) can be enabled in a follow-up once the suite is on `main`. Docs updated accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The previous attempt resolved the registries but failed with "Package BenchmarkTools not found": PkgBenchmark propagates the *calling* process's active project to the benchmark subprocess, and the step ran from the global environment instead of benchmark/Project.toml. Instantiate and run the suite with `--project=benchmark` (matches the verified local invocation `julia --project=benchmark -e 'benchmarkpkg(".")'`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jameskermode · 2026-06-29T13:32:26Z

@cortner OK to merge now or should we wait and fix in ET? My opinion is the function barrier does no harm and makes it work with current ET.

cortner · 2026-06-29T13:38:31Z

this is definitely ok for me to merge, this would be ok like this even if we fix the type instability in ET

jameskermode · 2026-06-29T14:57:13Z

I'll plan to merge this and #327 tag a patch release to get the regression fix to users

jameskermode and others added 3 commits June 29, 2026 13:29

cortner mentioned this pull request Jun 29, 2026

Type Instability in pullback ACEsuit/EquivariantTensors.jl#135

Open

This was referenced Jun 29, 2026

Fix type instability in SparseACEbasis pullback (#135) ACEsuit/EquivariantTensors.jl#136

Open

ET path: share interaction graph + analytic pair site_grads #327

Open

jameskermode merged commit 266f84e into main Jun 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ~13x force-evaluation regression in classic ACEModel#326

Fix ~13x force-evaluation regression in classic ACEModel#326
jameskermode merged 3 commits into
mainfrom
fix/classic-force-evaluation-regression

jameskermode commented Jun 29, 2026

Uh oh!

jameskermode commented Jun 29, 2026

Uh oh!

cortner commented Jun 29, 2026

Uh oh!

jameskermode commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jameskermode commented Jun 29, 2026

Summary

Root cause

Fix

Regression tooling (prevent recurrence)

Verification

Follow-up (not in this PR)

Uh oh!

jameskermode commented Jun 29, 2026

Uh oh!

cortner commented Jun 29, 2026

Uh oh!

jameskermode commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants