Skip to content

feat(pt/dpa4): add SO3 readout and small bug fix#5556

Open
OutisLi wants to merge 3 commits into
deepmodeling:masterfrom
OutisLi:pr/so3readout
Open

feat(pt/dpa4): add SO3 readout and small bug fix#5556
OutisLi wants to merge 3 commits into
deepmodeling:masterfrom
OutisLi:pr/so3readout

Conversation

@OutisLi

@OutisLi OutisLi commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added so3_readout configuration option for SeZM descriptor, enabling selection of computation methods ("none", "glu", "mlp") for final descriptor channel extraction.
  • Improvements

    • Optimized compiler version checks to run only when compilation is actually invoked, reducing unnecessary checks at initialization.
    • Enhanced neighbor list performance with device-aware method selection for both CPU and GPU execution.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Three independent changes: (1) DescrptSeZM gains a so3_readout option ("none", "glu", "mlp") that controls whether the final l=0 descriptor is extracted from a degree-0-only FFN or a full SO(3)-folded readout, wired through argcheck, constructor, forward paths, serialization, and example config; (2) SeZMModel defers check_compile_torch_version() from __init__ to the actual compilation entry points; (3) NvNeighborList suppresses Warp stderr on CPU-only hosts, adds a CPU-specific cell-list threshold, makes method selection device-aware, and passes max_atoms_per_system for batch_naive to avoid device–host sync.

Changes

so3_readout feature for DescrptSeZM

Layer / File(s) Summary
so3_readout contract: argcheck, constructor, validation, example
deepmd/utils/argcheck.py, deepmd/pt/model/descriptor/sezm.py, examples/water/dpa4/input.json
Adds doc_so3_readout and the so3_readout Argument (choices "none"/"glu"/"mlp") to descrpt_se_zm_args; adds the parameter to the DescrptSeZM.__init__ signature with lowercase normalization and ValueError guard; extends the class docstring; sets "mlp" in the example DPA4 config.
output_ffn construction and forward readout logic
deepmd/pt/model/descriptor/sezm.py
output_ffn construction branches on so3_readout to set lmax, ffn_so3_grid, and grid_mlp; forward() and forward_with_edges() build ffn_in from either the l=0 slice or the full coefficient tensor, apply a residual, then slice to l=0; serialize() persists so3_readout.

Deferred compile version check in SeZMModel

Layer / File(s) Summary
compile version check moved to compilation entry points
deepmd/pt/model/model/sezm_model.py
__init__ no longer calls check_compile_torch_version() when use_compile=True; the call is added to trace_and_compile(), compile_dens(), and _trace_lower_exportable() instead.

NvNeighborList CPU-only host improvements

Layer / File(s) Summary
stderr suppression, CPU threshold, device-aware method selection, build kwargs
deepmd/pt/utils/nv_nlist.py
Adds _suppress_native_stderr() context manager used in is_nv_available() on CPU-only hosts; introduces NV_CPU_CELL_LIST_THRESHOLD = 128; extends choose_nv_nlist_method with a device parameter for CPU-vs-GPU threshold selection; NvNeighborList.build passes device to method selection and forwards max_atoms_per_system via extra_nl_kwargs for "batch_naive".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: introducing SO3 readout functionality (the primary feature across multiple files) and addressing a compilation-related bug fix.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt/utils/nv_nlist.py`:
- Around line 70-77: The current code structure risks leaking the file
descriptor saved_fd if either the open() call on devnull or the first dup2()
call raises an exception before reaching the existing try-finally block. To fix
this, wrap the os.dup(stderr_fd) call and the subsequent open() and try-finally
blocks with an outer try-finally statement that ensures os.close(saved_fd) is
called in the finally clause regardless of whether an exception occurs during
the devnull file opening or the initial dup2() redirection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 313173f5-2344-4891-a24b-7c1d4e7883f7

📥 Commits

Reviewing files that changed from the base of the PR and between 1050e46 and 1d244fd.

📒 Files selected for processing (5)
  • deepmd/pt/model/descriptor/sezm.py
  • deepmd/pt/model/model/sezm_model.py
  • deepmd/pt/utils/nv_nlist.py
  • deepmd/utils/argcheck.py
  • examples/water/dpa4/input.json

Comment thread deepmd/pt/utils/nv_nlist.py
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.39535% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.85%. Comparing base (1050e46) to head (1d244fd).

Files with missing lines Patch % Lines
deepmd/pt/utils/nv_nlist.py 80.00% 6 Missing ⚠️
deepmd/pt/model/descriptor/sezm.py 87.50% 1 Missing ⚠️
deepmd/pt/model/model/sezm_model.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5556      +/-   ##
==========================================
- Coverage   82.16%   80.85%   -1.32%     
==========================================
  Files         896      896              
  Lines      102643   102676      +33     
  Branches     4340     4342       +2     
==========================================
- Hits        84341    83020    -1321     
- Misses      16965    18324    +1359     
+ Partials     1337     1332       -5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wanghan-iapcm

Copy link
Copy Markdown
Collaborator

Code review

Found 1 blocking issue (CI is red because of it):

so3_readout is added to the pt serialize format but not to the dpmodel DescrptDPA4 counterpart, breaking DPA4 cross-backend consistency.

The DPA4/SeZM descriptor is not pt-only — deepmd/dpmodel/descriptor/dpa4.py shares the identical serialize format (its deserialize consumes the pt serialize() output). This PR adds so3_readout to the pt side only:

"message_node_so3": self.message_node_so3,
"so3_readout": self.so3_readout,
"lebedev_quadrature": self.lebedev_quadrature,

but deepmd/dpmodel/descriptor/dpa4.py is unchanged (no so3_readout in its __init__, serialize, or output_ffn).

Why this fails: source/tests/consistent/descriptor/test_dpa4.py::...::test_pt_consistent_with_ref serializes the dpmodel descriptor, deserializes it into pt, re-serializes, and asserts the two config dicts are equal. After this PR the pt dict carries a so3_readout key the dpmodel dict lacks, so np.testing.assert_equal(data1, data2) fails — on the default (so3_readout="none") config. That is why every Test Python shard is red, e.g. TestDPA4_float64_1_1_1_False_True_gaussian::test_pt_consistent_with_ref.

There is also a quieter consequence for the non-default modes: dpmodel DescrptDPA4.__init__ has a **kwargs catch-all that silently swallows so3_readout, and its output_ffn is hard-wired to the none behavior. Since this PR makes examples/water/dpa4/input.json default to "so3_readout": "mlp", a model trained from the example would silently produce a different descriptor under the dpmodel / jax / pt_expt backends (no error, wrong numerics).

This is the serialization-discipline rule: a new serialized field must be added to all backends that share the format, in the same change — otherwise the shared-format consistency test breaks (as it has here).

Suggested fix: mirror so3_readout in dpmodel DescrptDPA4__init__ param, serialize() emit, and the output_ffn construction + forward output-mixing block (a direct port of the pt diff) — and add glu/mlp cases to the parity/consistent test matrix so the non-default paths are covered cross-backend. (A separate dpmodel-only PR won't work: a one-sided serialize-format change breaks the same consistency test in the other direction, so the two sides must land together.)

@wanghan-iapcm

Copy link
Copy Markdown
Collaborator

I may handle it after #5555 merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants