Skip to content

Fix ~13x force-evaluation regression in classic ACEModel#326

Merged
jameskermode merged 3 commits into
mainfrom
fix/classic-force-evaluation-regression
Jun 29, 2026
Merged

Fix ~13x force-evaluation regression in classic ACEModel#326
jameskermode merged 3 commits into
mainfrom
fix/classic-force-evaluation-regression

Conversation

@jameskermode

Copy link
Copy Markdown
Collaborator

Summary

The default ace1_model / classic ACEModel force evaluation regressed ~13× in the v0.10 (EquivariantTensors) series versus the previous release (0.9.1). This PR fixes it and adds regression tooling.

Root cause

A type instability in evaluate_ed (src/models/ace.jl). EquivariantTensors.pullback is not type-stable (returns Tuple{Any,Any}), so ∂Rnl/∂Ylm reached the inline gradient-assembly loops as Any, making every element-wise product dynamically dispatched. The numerical kernels were ~30 µs, but the call took ~400 µs and allocated ~764 KB/site — ~80% of the time was in the assembly loops.

The originally-suspected cause (ET autograd forces) was refuted by benchmark: the ET path is not the regression; in fact the broken classic path was slower than ET.

Fix

Move the gradient assembly into a function barrier (_assemble_grad_ed!) so Julia re-specialises on the concrete runtime types.

before after
evaluate_ed (40 neighbours) ~386 µs ~30 µs
classic forces, 512 atoms 190.8 ms 18.3 ms
cross-version vs 0.9.1 (512 atoms) ~13× slower ~1.15×

Forces are bit-identical (max|Δg| ≈ 1e-15).

Regression tooling (prevent recurrence)

  • benchmark/common.jl — shared model/system setup
  • benchmark/bench_forces_regression.jl — classic vs ET forces/energy (time + allocations)
  • benchmark/bench_crossversion.jl (+ worker) — release-vs-release comparison in isolated envs
  • benchmark/profile_classic_ed.jlevaluate_ed component profile
  • benchmark/benchmarks.jl — PkgBenchmark SUITE
  • .github/workflows/Benchmark.yml — non-blocking BenchmarkCI judge (PR vs base on same runner, threads pinned)
  • benchmark/README.md, benchmark/FORCE_REGRESSION_FINDINGS.md

Verification

test/models/test_ace.jl and test/models/test_calculator.jl pass (finite-difference force checks, Zygote-through-loss).

Follow-up (not in this PR)

Optimisations to the ET path (analytic site_grads to remove 12–154 MB/eval Zygote allocations; shared interaction graph across stacked calculators) — to be planned separately.

🤖 Generated with Claude Code

jameskermode and others added 3 commits June 29, 2026 13:29
The default `ace1_model` / classic `ACEModel` force evaluation regressed
~13x in the v0.10 (EquivariantTensors) series versus 0.9.1.

Root cause: a type instability in `evaluate_ed` (src/models/ace.jl).
`EquivariantTensors.pullback` is not type-stable (returns Tuple{Any,Any}),
so `∂Rnl` / `∂Ylm` reached the inline gradient-assembly loops as `Any`,
making every element-wise product dynamically dispatched. The numerical
kernels were ~30us but the call took ~400us and allocated ~764 KB/site,
with ~80% of the time spent in the assembly loops.

Fix: move the assembly into a function barrier (`_assemble_grad_ed!`) so
Julia re-specialises on the concrete runtime types. evaluate_ed drops from
~386us to ~30us for 40 neighbours; forces are bit-identical (max|Δg| ≈ 1e-15).
Cross-version regression vs 0.9.1 collapses from ~13x to ~1.15x.

Note: the originally-suspected cause (ET autograd forces) was refuted by
benchmark — the ET path is not the regression.

Also adds performance regression tooling under benchmark/ to prevent
recurrence:
- common.jl: shared model/system setup
- bench_forces_regression.jl: classic vs ET forces/energy (time + allocs)
- bench_crossversion.jl (+ worker): release-vs-release comparison
- profile_classic_ed.jl: evaluate_ed component profile
- benchmarks.jl: PkgBenchmark suite
- .github/workflows/Benchmark.yml: non-blocking BenchmarkCI judge (PR vs base)
- README.md, FORCE_REGRESSION_FINDINGS.md

Verified: test/models/test_ace.jl and test/models/test_calculator.jl pass
(finite-difference force checks, Zygote-through-loss).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Benchmark job failed at `Pkg.add("BenchmarkCI")` — the workflow only
added the ACEregistry, so the General registry was absent and BenchmarkCI
could not be resolved.

Also, BenchmarkCI's judge-against-base cannot work on this PR: the benchmark
suite does not exist on `main` yet, so there is no baseline to compare to.

Switch to a robust run-and-report approach:
- add both the General and ACE registries
- run the PkgBenchmark suite via `benchmarkpkg(".")` (verified locally)
- post results to the job summary
- keep it non-blocking, threads pinned

Base-branch `judge` (e.g. via BenchmarkCI) can be enabled in a follow-up once
the suite is on `main`. Docs updated accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous attempt resolved the registries but failed with
"Package BenchmarkTools not found": PkgBenchmark propagates the *calling*
process's active project to the benchmark subprocess, and the step ran from
the global environment instead of benchmark/Project.toml.

Instantiate and run the suite with `--project=benchmark` (matches the verified
local invocation `julia --project=benchmark -e 'benchmarkpkg(".")'`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jameskermode

Copy link
Copy Markdown
Collaborator Author

@cortner OK to merge now or should we wait and fix in ET? My opinion is the function barrier does no harm and makes it work with current ET.

@cortner

cortner commented Jun 29, 2026

Copy link
Copy Markdown
Member

this is definitely ok for me to merge, this would be ok like this even if we fix the type instability in ET

@jameskermode

Copy link
Copy Markdown
Collaborator Author

I'll plan to merge this and #327 tag a patch release to get the regression fix to users

@jameskermode jameskermode merged commit 266f84e into main Jun 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants