fix(pt): sort nlist for compressed se_e2_a in forward_lower by wanghan-iapcm · Pull Request #5524 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-06-13T01:17:45Z

Summary

Fixes wrong energy/forces (and unstable LAMMPS MD) for compressed se_e2_a models evaluated through forward_lower — the C++/LAMMPS inference path. Reported in discussion #5438.

Root cause

The compressed tabulate_fusion_se_a op (source/lib/src/tabulate.cc, forward and grad kernels) has an is_sorted-gated early-termination:

ago = em_x[ii * nnei + nnei - 1];           // last neighbor's em_x
if (ago == xx && ll[1]==0 && ll[2]==0 && ll[3]==0 && is_sorted) break;

It stops accumulating at the first neighbor whose env-mat direction is zero. Both -1 padding and out-of-rcut neighbors (sw==0) have zero direction and the same em_x == -davg/dstd (== ago), so the op assumes all such neighbors are trailing. is_sorted defaults to true and the PT op never overrides it.

The C++/LAMMPS forward_lower neighbor list uses rcut + skin and is not distance-sorted. _format_nlist only filters out-of-rcut neighbors in its sort branch (n_nnei > nnei), which is skipped when the LAMMPS list is narrower than sum(sel) (pad-only branch). The zero-direction neighbors then land before real ones, so the op breaks early and silently drops real neighbors → wrong descriptor → wrong energy/forces → unstable MD.

Only the compressed path is affected:

The uncompressed embedding-net path sums over neighbors and treats zero-direction fillers identically regardless of position, so it is order-invariant.
Only tabulate_fusion_se_a (forward + grad) has this early-termination; se_t/se_r forward kernels do not.

It is device-independent (reproduces identically on CPU and GPU).

Fix

The wiring already exists — the model calls format_nlist(..., extra_nlist_sort=self.need_sorted_nlist_for_lower()) — but DescrptBlockSeA.need_sorted_nlist_for_lower() always returned False. Make it return self.compress:

def need_sorted_nlist_for_lower(self) -> bool:
    return self.compress

When compression is enabled this forces the sort + rcut-filter branch (in-rcut neighbors first, all padding last), restoring the op's invariant. The standard (uncompressed) route is unchanged, so there is no added cost on the common path.

Verification

In LAMMPS (CPU and GPU) the compressed model now matches the uncompressed model to ~2.5e-14 (was ~0.5 eV off with scrambled forces).
New regression test source/tests/pt/model/test_compressed_se_a_forward_lower.py runs compressed forward_lower with an unsorted, over-rcut neighbor list and compares energy + force to the uncompressed reference, parameterized over type_one_side ∈ {True, False}. It fails without this fix and passes with it; existing test_compressed_descriptor_se_a.py and test_forward_lower.py still pass.

Summary by CodeRabbit

Bug Fixes
- Fixed compressed descriptor behavior so enabling compression preserves neighbor ordering invariants and no longer causes valid neighbors to be dropped; energies and forces remain correct with unsorted/padded neighbor lists.
Tests
- Added regression tests for multiple descriptor variants that validate compressed mode against uncompressed baselines using unsorted/over-cut neighbor lists, asserting energy and force fidelity.

The compressed `tabulate_fusion_se_a` op uses an `is_sorted` early-termination that stops accumulating at the first neighbor whose env-mat direction is zero (padding, or an out-of-rcut neighbor with sw==0), assuming such neighbors are trailing. The C++/LAMMPS `forward_lower` neighbor list (rcut+skin, not distance-sorted) can interleave these zero-direction neighbors before real ones, so the op silently drops real neighbors, producing wrong energy/forces and unstable MD. Only the compressed path is affected: the uncompressed embedding -net path sums over neighbors and treats zero-direction fillers identically regardless of position, and only `tabulate_fusion_se_a` (forward and grad) has this early-termination (se_t/se_r do not). Make `DescrptBlockSeA.need_sorted_nlist_for_lower()` return `self.compress`. The model already wires `format_nlist(..., extra_nlist_sort=need_sorted_nlist_for_lower())`, so when compression is enabled this forces the sort + rcut-filter branch, which puts in-rcut neighbors first and all padding last, restoring the op's invariant. The standard (uncompressed) route is unchanged. Add a regression test that runs compressed `forward_lower` with an unsorted, over-rcut neighbor list and compares energy and force to the uncompressed reference, parameterized over type_one_side. It fails without this fix and passes with it. Reported in discussion deepmodeling#5438.

coderabbitai · 2026-06-13T01:21:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 34f1d1e1-60fc-4437-9de9-782c0bb3bfcc

📥 Commits

Reviewing files that changed from the base of the PR and between e732db7 and d50ff94.

📒 Files selected for processing (1)

source/tests/pt/model/test_compressed_se_r_forward_lower.py

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/pt/model/test_compressed_se_r_forward_lower.py

📝 Walkthrough

Walkthrough

Require sorted neighbor lists for compressed SE_A by returning the compression flag from need_sorted_nlist_for_lower(). Add regression tests (SE_A and SE_R) that verify compressed forward_lower matches uncompressed references when given intentionally unsorted, over-rcut neighbor lists.

Changes

Compressed SE_A sorted neighbor list fix

Layer / File(s)	Summary
Descriptor method behavior fix `deepmd/pt/model/descriptor/se_a.py`	`need_sorted_nlist_for_lower()` now returns `self.compress` instead of constant `False`. Documentation explains the compressed tabulation op requires sorted neighbor lists to preserve its is_sorted early-termination invariant.
Regression tests for compressed SE_A `source/tests/pt/model/test_compressed_se_a_forward_lower.py`	Add a test module that builds rcut-bounded reference outputs, enables compression using a min-neighbor-distance lower bound, constructs reversed over-rcut neighbor lists (padding/out-of-rcut neighbors first), runs `forward_lower`, and asserts energies and reduced/assembled forces match the uncompressed reference.
Regression tests for compressed SE_R `source/tests/pt/model/test_compressed_se_r_forward_lower.py`	Add a test module that runs compressed vs uncompressed `forward_lower` on the same reversed over-rcut FLAT neighbor list for `se_e2_r` and asserts equality for total energy and reduced extended forces.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

njzjz

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically summarizes the main fix: enabling sorted neighbor lists for compressed se_e2_a models in the forward_lower path to prevent incorrect energy/force calculations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-13T02:08:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.19%. Comparing base (5d94bd6) to head (d50ff94).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5524   +/-   ##
=======================================
  Coverage   82.19%   82.19%           
=======================================
  Files         891      891           
  Lines      101599   101600    +1     
  Branches     4242     4242           
=======================================
+ Hits        83507    83509    +2     
+ Misses      16789    16787    -2     
- Partials     1303     1304    +1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

njzjz

Does the same problem exist in se_e2_r?

Companion to test_compressed_se_a_forward_lower.py. Confirms se_e2_r is immune to the unsorted/over-rcut forward_lower nlist that broke compressed se_a (discussion deepmodeling#5438): tabulate_fusion_se_r has no is_sorted early-termination and reduces over neighbors order-independently, so need_sorted_nlist_for_lower() correctly stays False for se_r. Expected to pass with no production code change.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py`:
- Around line 123-134: The test must explicitly assert that the reversed
neighbor list actually contains both in-rcut and out-of-rcut neighbors before
calling forward_lower: after you get nlist2 from
extend_input_and_build_neighbor_list (keep a copy before flipping), compute
neighbor distances by gathering neighbor coordinates from coord (use the
returned nlist copy and coord.unsqueeze(0)), compare those distances to rcut to
form boolean masks, and assert at least one True (distance <= rcut) and at least
one False (distance > rcut); only then proceed to flip nlist2 and call
self.model.forward_lower so the regression precondition is enforced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6dfa7d18-ce88-4e0b-a143-5c6c69ae7602

📥 Commits

Reviewing files that changed from the base of the PR and between fe877eb and afa9d01.

📒 Files selected for processing (1)

source/tests/pt/model/test_compressed_se_r_forward_lower.py

The first version compared compressed over-cut vs uncompressed CLEAN, which conflated compression accuracy with se_r's intrinsic nlist-representation sensitivity (se_r's mean reduction, unlike se_a, is not invariant to clean vs over-cut nlist layout -> a ~1e-4 uncompressed-only gap). Compare compressed vs uncompressed on the IDENTICAL unsorted over-cut nlist instead, which isolates the op: it matches to ~1e-16, confirming tabulate_fusion_se_r has no order/is_sorted bug, while still catching an se_a-style divergence.

coderabbitai

♻️ Duplicate comments (1)

source/tests/pt/model/test_compressed_se_r_forward_lower.py (1)

125-128: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the regression precondition explicitly before the reference run.

The test assumes the generated system always yields both in-rcut and out-of-rcut neighbors after reversal, but never asserts that condition. If seed or device behavior changes, the test can become a false-positive guard.

🛡️ Proposed precondition check

         nlist = torch.flip(nlist, dims=[-1])
+        # Guard the intended scenario: reversed nlist must include both
+        # in-rcut and out-of-rcut neighbors for this configuration.
+        coord0 = ec[:, : coord.shape[0], :]
+        safe_nlist = torch.where(nlist >= 0, nlist, torch.zeros_like(nlist))
+        gather_idx = safe_nlist.view(1, -1, 1).expand(-1, -1, 3)
+        nei_coord = torch.gather(ec, 1, gather_idx).view(1, coord.shape[0], -1, 3)
+        rr = torch.linalg.norm(nei_coord - coord0[:, :, None, :], dim=-1)
+        real = nlist >= 0
+        self.assertTrue(torch.any(real & (rr <= rcut)), "No in-rcut neighbors found")
+        self.assertTrue(torch.any(real & (rr > rcut)), "No out-of-rcut neighbors found")
 
         # reference: uncompressed forward_lower on this exact nlist
         ref = self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py` around lines 125
- 128, The test must explicitly assert the regression precondition that the
flipped neighbor list produces both in-rcut and out-of-rcut neighbors before
calling forward_lower: compute distances for the pairs in nlist (using ec and
the edge attributes ea or the model's distance utility) and assert there is at
least one distance <= self.model.cutoff and at least one distance >
self.model.cutoff (or use mp.rcut if available), then only call ref =
self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False); this
ensures nlist (after torch.flip) actually contains both in- and out-of-range
neighbors required by the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@source/tests/pt/model/test_compressed_se_r_forward_lower.py`:
- Around line 125-128: The test must explicitly assert the regression
precondition that the flipped neighbor list produces both in-rcut and
out-of-rcut neighbors before calling forward_lower: compute distances for the
pairs in nlist (using ec and the edge attributes ea or the model's distance
utility) and assert there is at least one distance <= self.model.cutoff and at
least one distance > self.model.cutoff (or use mp.rcut if available), then only
call ref = self.model.forward_lower(ec, ea, nlist, mp, do_atomic_virial=False);
this ensures nlist (after torch.flip) actually contains both in- and
out-of-range neighbors required by the test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b16614aa-77d3-456a-a8d1-e6d052d5f333

📥 Commits

Reviewing files that changed from the base of the PR and between afa9d01 and e732db7.

📒 Files selected for processing (1)

source/tests/pt/model/test_compressed_se_r_forward_lower.py

…anch) The ~1e-4 clean-vs-over-cut gap is NOT se_r-specific reduction sensitivity (my earlier note was wrong). It is format_nlist's pad branch (width == nnei, over-rcut neighbors, no re-sort/rcut-filter) being order-dependent -- the same root condition as deepmodeling#5438 -- and it affects se_a (~4e-6) and se_r (~1e-4) identically in the uncompressed path. mixed_types=True vs False is bit-identical and not involved. The test still compares compressed vs uncompressed on the same nlist to cancel that shared effect and isolate the (bug-free) se_r op.

…ch (deepmodeling#5529) ## Summary `forward_lower`'s `_format_nlist` only filtered out-of-`rcut` neighbors in its **sort** branch (`n_nnei > nnei or extra_nlist_sort`). When the input neighbor-list width is `<= nnei` (`sum(sel)`) and `extra_nlist_sort` is `False`, it took the **pad** branch and returned the nlist unchanged — never dropping neighbors beyond `rcut`. The C++/LAMMPS path (`DeepPotPT.cc` → `copy_from_nlist` → `padding`) builds the neighbor list with `rcut + skin` and does **not** rcut-filter before `forward_lower` (the in-code comment in `commonPT.h` is explicit: *"No truncation or distance sorting is done — the model's format_nlist handles that"*). Its width is the per-atom neighbor count, which on sparse systems is `<= nnei` — exactly the case in discussion deepmodeling#5438 (width 39 < 100). In that regime, out-of-`rcut` neighbors leak into the descriptor. Because the pad branch does not re-sort, the leaked contribution is **order-dependent**: reversing the nlist changes the energy by ~`1e-4` (se_r) / ~`4e-6` (se_a). This is the same root condition as deepmodeling#5438; that PR (deepmodeling#5524) closed it only for *compressed* se_a (by forcing `extra_nlist_sort`). The uncompressed paths and se_r/se_t remained exposed. ## Fix The pad branch now also drops neighbors with `rr > rcut` (no re-sort — these descriptors reduce over neighbors order-independently). Applied to **both** the `pt` and `dpmodel` backends (`dpmodel` is shared by `pt_expt`). The exported `pt_expt` graph is unaffected: export forces `extra_nlist_sort=True`, so it always takes the sort branch; the new pad-branch code is eager-only. ## Verification - New regression test `test_format_nlist_overcut.py`: over-cut (`rcut+2`) `forward_lower`, as-is and reversed, must match the `rcut`-bounded reference for se_a and se_r (energy + force, `1e-10`). The reversed cases **fail without this fix** and pass with it. - After the fix: se_a reversed over-cut `rel = 0.0`, se_r `rel = 4.4e-16` (were `4.3e-6` / `1.4e-4`). - Existing suites green on GPU: `test_jit` (all models), `test_forward_lower`, `test_permutation`, `pt_expt` descriptor + ener-model export, universal dp/pt model consistency.  ## Summary by CodeRabbit ## Summary by CodeRabbit * **Bug Fixes** * Improved neighbor-list handling to drop neighbor candidates whose distances exceed the cutoff when using over-cut neighbor lists, preserving neighbor order while ensuring out-of-cutoff entries are excluded. * **Tests** * Added regression coverage for over-cut neighbor lists in the PT backend and common DP model backend, validating energy/forces consistency across descriptor types and forward/reversed neighbor ordering.  --------- Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

dosubot Bot added the bug label Jun 13, 2026

github-actions Bot added the Python label Jun 13, 2026

wanghan-iapcm requested a review from njzjz June 13, 2026 01:19

njzjz reviewed Jun 13, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread source/tests/pt/model/test_compressed_se_r_forward_lower.py Outdated

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

njzjz approved these changes Jun 13, 2026

View reviewed changes

wanghan-iapcm mentioned this pull request Jun 13, 2026

fix(pt,dpmodel): drop out-of-rcut neighbors in _format_nlist pad branch #5529

Merged

wanghan-iapcm enabled auto-merge June 14, 2026 03:33

wanghan-iapcm added this pull request to the merge queue Jun 14, 2026

njzjz linked an issue Jun 14, 2026 that may be closed by this pull request

the MD running is only working when using CPU #5092

Closed

This was referenced Jun 14, 2026

the MD running is only working when using CPU #5092

Closed

List of critical bugs giving incorrect results without error messages #2866

Open

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Jun 14, 2026

wanghan-iapcm added this pull request to the merge queue Jun 15, 2026

Merged via the queue into deepmodeling:master with commit beb50da Jun 15, 2026
70 checks passed

wanghan-iapcm deleted the fix/compressed-se-a-unsorted-nlist branch June 15, 2026 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pt): sort nlist for compressed se_e2_a in forward_lower#5524

fix(pt): sort nlist for compressed se_e2_a in forward_lower#5524
wanghan-iapcm merged 4 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix/compressed-se-a-unsorted-nlist

wanghan-iapcm commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

njzjz left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghan-iapcm commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

njzjz left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanghan-iapcm commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

codecov Bot commented Jun 13, 2026 •

edited

Loading