fix(pt): sort nlist for compressed se_e2_a in forward_lower#5524
Conversation
The compressed `tabulate_fusion_se_a` op uses an `is_sorted` early-termination that stops accumulating at the first neighbor whose env-mat direction is zero (padding, or an out-of-rcut neighbor with sw==0), assuming such neighbors are trailing. The C++/LAMMPS `forward_lower` neighbor list (rcut+skin, not distance-sorted) can interleave these zero-direction neighbors before real ones, so the op silently drops real neighbors, producing wrong energy/forces and unstable MD. Only the compressed path is affected: the uncompressed embedding -net path sums over neighbors and treats zero-direction fillers identically regardless of position, and only `tabulate_fusion_se_a` (forward and grad) has this early-termination (se_t/se_r do not). Make `DescrptBlockSeA.need_sorted_nlist_for_lower()` return `self.compress`. The model already wires `format_nlist(..., extra_nlist_sort=need_sorted_nlist_for_lower())`, so when compression is enabled this forces the sort + rcut-filter branch, which puts in-rcut neighbors first and all padding last, restoring the op's invariant. The standard (uncompressed) route is unchanged. Add a regression test that runs compressed `forward_lower` with an unsorted, over-rcut neighbor list and compares energy and force to the uncompressed reference, parameterized over type_one_side. It fails without this fix and passes with it. Reported in discussion deepmodeling#5438.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRequire sorted neighbor lists for compressed SE_A by returning the compression flag from need_sorted_nlist_for_lower(). Add regression tests (SE_A and SE_R) that verify compressed forward_lower matches uncompressed references when given intentionally unsorted, over-rcut neighbor lists. ChangesCompressed SE_A sorted neighbor list fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5524 +/- ##
=======================================
Coverage 82.19% 82.19%
=======================================
Files 891 891
Lines 101599 101600 +1
Branches 4242 4242
=======================================
+ Hits 83507 83509 +2
+ Misses 16789 16787 -2
- Partials 1303 1304 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
njzjz
left a comment
There was a problem hiding this comment.
Does the same problem exist in se_e2_r?
Companion to test_compressed_se_a_forward_lower.py. Confirms se_e2_r is immune to the unsorted/over-rcut forward_lower nlist that broke compressed se_a (discussion deepmodeling#5438): tabulate_fusion_se_r has no is_sorted early-termination and reduces over neighbors order-independently, so need_sorted_nlist_for_lower() correctly stays False for se_r. Expected to pass with no production code change.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py`:
- Around line 123-134: The test must explicitly assert that the reversed
neighbor list actually contains both in-rcut and out-of-rcut neighbors before
calling forward_lower: after you get nlist2 from
extend_input_and_build_neighbor_list (keep a copy before flipping), compute
neighbor distances by gathering neighbor coordinates from coord (use the
returned nlist copy and coord.unsqueeze(0)), compare those distances to rcut to
form boolean masks, and assert at least one True (distance <= rcut) and at least
one False (distance > rcut); only then proceed to flip nlist2 and call
self.model.forward_lower so the regression precondition is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6dfa7d18-ce88-4e0b-a143-5c6c69ae7602
📒 Files selected for processing (1)
source/tests/pt/model/test_compressed_se_r_forward_lower.py
The first version compared compressed over-cut vs uncompressed CLEAN, which conflated compression accuracy with se_r's intrinsic nlist-representation sensitivity (se_r's mean reduction, unlike se_a, is not invariant to clean vs over-cut nlist layout -> a ~1e-4 uncompressed-only gap). Compare compressed vs uncompressed on the IDENTICAL unsorted over-cut nlist instead, which isolates the op: it matches to ~1e-16, confirming tabulate_fusion_se_r has no order/is_sorted bug, while still catching an se_a-style divergence.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
source/tests/pt/model/test_compressed_se_r_forward_lower.py (1)
125-128:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert the regression precondition explicitly before the reference run.
The test assumes the generated system always yields both in-rcut and out-of-rcut neighbors after reversal, but never asserts that condition. If seed or device behavior changes, the test can become a false-positive guard.
🛡️ Proposed precondition check
nlist = torch.flip(nlist, dims=[-1]) + # Guard the intended scenario: reversed nlist must include both + # in-rcut and out-of-rcut neighbors for this configuration. + coord0 = ec[:, : coord.shape[0], :] + safe_nlist = torch.where(nlist >= 0, nlist, torch.zeros_like(nlist)) + gather_idx = safe_nlist.view(1, -1, 1).expand(-1, -1, 3) + nei_coord = torch.gather(ec, 1, gather_idx).view(1, coord.shape[0], -1, 3) + rr = torch.linalg.norm(nei_coord - coord0[:, :, None, :], dim=-1) + real = nlist >= 0 + self.assertTrue(torch.any(real & (rr <= rcut)), "No in-rcut neighbors found") + self.assertTrue(torch.any(real & (rr > rcut)), "No out-of-rcut neighbors found") # reference: uncompressed forward_lower on this exact nlist ref = self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py` around lines 125 - 128, The test must explicitly assert the regression precondition that the flipped neighbor list produces both in-rcut and out-of-rcut neighbors before calling forward_lower: compute distances for the pairs in nlist (using ec and the edge attributes ea or the model's distance utility) and assert there is at least one distance <= self.model.cutoff and at least one distance > self.model.cutoff (or use mp.rcut if available), then only call ref = self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False); this ensures nlist (after torch.flip) actually contains both in- and out-of-range neighbors required by the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py`:
- Around line 125-128: The test must explicitly assert the regression
precondition that the flipped neighbor list produces both in-rcut and
out-of-rcut neighbors before calling forward_lower: compute distances for the
pairs in nlist (using ec and the edge attributes ea or the model's distance
utility) and assert there is at least one distance <= self.model.cutoff and at
least one distance > self.model.cutoff (or use mp.rcut if available), then only
call ref = self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False);
this ensures nlist (after torch.flip) actually contains both in- and
out-of-range neighbors required by the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b16614aa-77d3-456a-a8d1-e6d052d5f333
📒 Files selected for processing (1)
source/tests/pt/model/test_compressed_se_r_forward_lower.py
…anch) The ~1e-4 clean-vs-over-cut gap is NOT se_r-specific reduction sensitivity (my earlier note was wrong). It is format_nlist's pad branch (width == nnei, over-rcut neighbors, no re-sort/rcut-filter) being order-dependent -- the same root condition as deepmodeling#5438 -- and it affects se_a (~4e-6) and se_r (~1e-4) identically in the uncompressed path. mixed_types=True vs False is bit-identical and not involved. The test still compares compressed vs uncompressed on the same nlist to cancel that shared effect and isolate the (bug-free) se_r op.
…ch (deepmodeling#5529) ## Summary `forward_lower`'s `_format_nlist` only filtered out-of-`rcut` neighbors in its **sort** branch (`n_nnei > nnei or extra_nlist_sort`). When the input neighbor-list width is `<= nnei` (`sum(sel)`) and `extra_nlist_sort` is `False`, it took the **pad** branch and returned the nlist unchanged — never dropping neighbors beyond `rcut`. The C++/LAMMPS path (`DeepPotPT.cc` → `copy_from_nlist` → `padding`) builds the neighbor list with `rcut + skin` and does **not** rcut-filter before `forward_lower` (the in-code comment in `commonPT.h` is explicit: *"No truncation or distance sorting is done — the model's format_nlist handles that"*). Its width is the per-atom neighbor count, which on sparse systems is `<= nnei` — exactly the case in discussion deepmodeling#5438 (width 39 < 100). In that regime, out-of-`rcut` neighbors leak into the descriptor. Because the pad branch does not re-sort, the leaked contribution is **order-dependent**: reversing the nlist changes the energy by ~`1e-4` (se_r) / ~`4e-6` (se_a). This is the same root condition as deepmodeling#5438; that PR (deepmodeling#5524) closed it only for *compressed* se_a (by forcing `extra_nlist_sort`). The uncompressed paths and se_r/se_t remained exposed. ## Fix The pad branch now also drops neighbors with `rr > rcut` (no re-sort — these descriptors reduce over neighbors order-independently). Applied to **both** the `pt` and `dpmodel` backends (`dpmodel` is shared by `pt_expt`). The exported `pt_expt` graph is unaffected: export forces `extra_nlist_sort=True`, so it always takes the sort branch; the new pad-branch code is eager-only. ## Verification - New regression test `test_format_nlist_overcut.py`: over-cut (`rcut+2`) `forward_lower`, as-is and reversed, must match the `rcut`-bounded reference for se_a and se_r (energy + force, `1e-10`). The reversed cases **fail without this fix** and pass with it. - After the fix: se_a reversed over-cut `rel = 0.0`, se_r `rel = 4.4e-16` (were `4.3e-6` / `1.4e-4`). - Existing suites green on GPU: `test_jit` (all models), `test_forward_lower`, `test_permutation`, `pt_expt` descriptor + ener-model export, universal dp/pt model consistency. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit * **Bug Fixes** * Improved neighbor-list handling to drop neighbor candidates whose distances exceed the cutoff when using over-cut neighbor lists, preserving neighbor order while ensuring out-of-cutoff entries are excluded. * **Tests** * Added regression coverage for over-cut neighbor lists in the PT backend and common DP model backend, validating energy/forces consistency across descriptor types and forward/reversed neighbor ordering. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
Summary
Fixes wrong energy/forces (and unstable LAMMPS MD) for compressed
se_e2_amodels evaluated throughforward_lower— the C++/LAMMPS inference path. Reported in discussion #5438.Root cause
The compressed
tabulate_fusion_se_aop (source/lib/src/tabulate.cc, forward and grad kernels) has anis_sorted-gated early-termination:It stops accumulating at the first neighbor whose env-mat direction is zero. Both
-1padding and out-of-rcutneighbors (sw==0) have zero direction and the sameem_x == -davg/dstd(== ago), so the op assumes all such neighbors are trailing.is_sorteddefaults totrueand the PT op never overrides it.The C++/LAMMPS
forward_lowerneighbor list usesrcut + skinand is not distance-sorted._format_nlistonly filters out-of-rcutneighbors in its sort branch (n_nnei > nnei), which is skipped when the LAMMPS list is narrower thansum(sel)(pad-only branch). The zero-direction neighbors then land before real ones, so the op breaks early and silently drops real neighbors → wrong descriptor → wrong energy/forces → unstable MD.Only the compressed path is affected:
tabulate_fusion_se_a(forward + grad) has this early-termination;se_t/se_rforward kernels do not.It is device-independent (reproduces identically on CPU and GPU).
Fix
The wiring already exists — the model calls
format_nlist(..., extra_nlist_sort=self.need_sorted_nlist_for_lower())— butDescrptBlockSeA.need_sorted_nlist_for_lower()always returnedFalse. Make it returnself.compress:When compression is enabled this forces the sort +
rcut-filter branch (in-rcutneighbors first, all padding last), restoring the op's invariant. The standard (uncompressed) route is unchanged, so there is no added cost on the common path.Verification
source/tests/pt/model/test_compressed_se_a_forward_lower.pyruns compressedforward_lowerwith an unsorted, over-rcutneighbor list and compares energy + force to the uncompressed reference, parameterized overtype_one_side ∈ {True, False}. It fails without this fix and passes with it; existingtest_compressed_descriptor_se_a.pyandtest_forward_lower.pystill pass.Summary by CodeRabbit
Bug Fixes
Tests