refactor(api_cc): load backends as plugins#5531
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces a dynamic plugin architecture for DeepMD backend implementations. Instead of compile-time conditional backend selection, backends (TensorFlow, PyTorch, JAX, Paddle) are now built as separate plugin libraries and loaded at runtime via platform-specific dynamic linking. The refactoring includes a new plugin ABI contract, cross-platform library loading infrastructure, templated factory helpers, backend-specific plugin wrappers, and updated integration in existing backend classes. ChangesBackend Plugin Architecture Implementation
Sequence DiagramssequenceDiagram
participant App as Application/Backend Class
participant Factory as Backend Factory
participant PluginLoader as Plugin Loader
participant SymbolRes as Symbol Resolver
participant Plugin as Backend Plugin<br/>(e.g., libdeepmd_backend_tf.so)
participant Cache as Plugin Handle Cache
App->>Factory: create_deeppot_backend_from_plugin<br/>(backend, model, gpu, file_content)
Factory->>Cache: load_plugin(DPBackend::TensorFlow)
Cache->>PluginLoader: Check cached handle for TensorFlow
alt Plugin not cached
PluginLoader->>PluginLoader: Build candidate paths:<br/>DP_BACKEND_PLUGIN_PATH, libdeepmd_cc dir
loop For each candidate
PluginLoader->>Plugin: dlopen(candidate_path)
alt Success
Plugin-->>PluginLoader: Valid handle
else Failed
PluginLoader->>PluginLoader: Log error, try next
end
end
PluginLoader->>SymbolRes: Resolve create/delete/free symbols
SymbolRes->>Plugin: dlsym(handle, symbol_name)
Plugin-->>SymbolRes: Function pointer
SymbolRes-->>PluginLoader: Symbol dict
end
PluginLoader-->>Cache: Store/return PluginHandle
Cache-->>Factory: PluginHandle{create_fn, delete_fn, free_error_fn}
Factory->>Plugin: create_fn(model, gpu, file_content, &error_msg)
Plugin-->>Factory: void* backend_instance
Factory-->>Factory: Wrap in std::shared_ptr<br/>with custom deleter
Factory-->>App: std::shared_ptr<DeepPotBackend>
sequenceDiagram
participant Init as Backend Class::init()
participant OpLib as load_op_library(DPBackend)
participant Loader as Library Loader
participant BuiltIn as Built-in Op Library<br/>(TF/PT/Paddle)
participant Custom as Custom Plugins
Init->>OpLib: load_op_library(DPBackend::PyTorch)
OpLib->>Loader: Determine which backend<br/>built-in lib to load
Loader->>BuiltIn: dlopen PyTorch operator library
BuiltIn-->>Loader: Loaded
Loader-->>OpLib: Done with backend lib
OpLib->>Custom: _load_customized_plugins()<br/>via DP_PLUGIN_PATH
Custom->>Loader: For each plugin path entry
Loader-->>Custom: Load plugin DSO
Custom-->>OpLib: All custom plugins loaded
OpLib-->>Init: Initialization complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
source/api_cc/src/BackendPlugin.cc (1)
312-343: ⚡ Quick winInconsistent exception handling compared to
create_deeppot_backend_from_plugin.The DeepPot factory (lines 282-294) wraps the plugin creation call in a try-catch to re-wrap exceptions from the plugin. This factory and the DeepTensor/DipoleChargeModifier factories below call their respective create functions without this protection.
If a plugin's create function throws (instead of returning nullptr with an error message), those exceptions will propagate unwrapped with potentially confusing stack traces.
Suggested fix to add consistent exception handling
char* error_message = nullptr; - void* backend_handle = - create_deepspin(model.c_str(), gpu_rank, file_content.data(), - file_content.size(), &error_message); + void* backend_handle = nullptr; + try { + backend_handle = + create_deepspin(model.c_str(), gpu_rank, file_content.data(), + file_content.size(), &error_message); + } catch (const deepmd::deepmd_exception& e) { + throw; + } catch (const std::exception& e) { + throw deepmd::deepmd_exception("Backend plugin " + plugin->path + + " threw an exception: " + e.what()); + } catch (...) { + throw deepmd::deepmd_exception("Backend plugin " + plugin->path + + " threw an unknown exception"); + }Apply similar wrapping to
create_deeptensor_backend_from_pluginandcreate_dipole_charge_modifier_backend_from_plugin.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/api_cc/src/BackendPlugin.cc` around lines 312 - 343, The create_deepspin_backend_from_plugin factory (and the related create_deeptensor_backend_from_plugin and create_dipole_charge_modifier_backend_from_plugin) must mirror the exception-wrapping used in create_deeppot_backend_from_plugin: wrap the plugin create_* call in a try-catch that catches std::exception (and a catch-all) and rethrow as deepmd::deepmd_exception with the plugin path and any plugin error message (use take_plugin_error(plugin, error_message) or plugin->free_error as appropriate); specifically, enclose the call to create_deepspin(...) in a try block and on catch build the same formatted message used when create_deepspin returns nullptr, then throw deepmd_exception, and apply the same pattern to create_deeptensor_backend_from_plugin and create_dipole_charge_modifier_backend_from_plugin so plugin-thrown exceptions are consistently wrapped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@source/api_cc/src/BackendPlugin.cc`:
- Around line 312-343: The create_deepspin_backend_from_plugin factory (and the
related create_deeptensor_backend_from_plugin and
create_dipole_charge_modifier_backend_from_plugin) must mirror the
exception-wrapping used in create_deeppot_backend_from_plugin: wrap the plugin
create_* call in a try-catch that catches std::exception (and a catch-all) and
rethrow as deepmd::deepmd_exception with the plugin path and any plugin error
message (use take_plugin_error(plugin, error_message) or plugin->free_error as
appropriate); specifically, enclose the call to create_deepspin(...) in a try
block and on catch build the same formatted message used when create_deepspin
returns nullptr, then throw deepmd_exception, and apply the same pattern to
create_deeptensor_backend_from_plugin and
create_dipole_charge_modifier_backend_from_plugin so plugin-thrown exceptions
are consistently wrapped.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 512a9b00-506e-49d0-94ca-c4502a8a7684
📒 Files selected for processing (27)
source/api_c/CMakeLists.txtsource/api_cc/CMakeLists.txtsource/api_cc/include/BackendPlugin.hsource/api_cc/include/common.hsource/api_cc/include/commonTF.hsource/api_cc/src/BackendPlugin.ccsource/api_cc/src/BackendPluginFactory.hsource/api_cc/src/DataModifier.ccsource/api_cc/src/DataModifierTF.ccsource/api_cc/src/DeepPot.ccsource/api_cc/src/DeepPotJAXPlugin.ccsource/api_cc/src/DeepPotPD.ccsource/api_cc/src/DeepPotPDPlugin.ccsource/api_cc/src/DeepPotPT.ccsource/api_cc/src/DeepPotPTExpt.ccsource/api_cc/src/DeepPotPTExptPlugin.ccsource/api_cc/src/DeepPotPTPlugin.ccsource/api_cc/src/DeepPotTF.ccsource/api_cc/src/DeepPotTFPlugin.ccsource/api_cc/src/DeepSpin.ccsource/api_cc/src/DeepSpinPT.ccsource/api_cc/src/DeepSpinPTExpt.ccsource/api_cc/src/DeepSpinTF.ccsource/api_cc/src/DeepTensor.ccsource/api_cc/src/DeepTensorPT.ccsource/api_cc/src/DeepTensorTF.ccsource/api_cc/src/common.cc
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5531 +/- ##
==========================================
- Coverage 82.19% 82.11% -0.08%
==========================================
Files 891 898 +7
Lines 101599 101972 +373
Branches 4242 4274 +32
==========================================
+ Hits 83507 83737 +230
- Misses 16789 16908 +119
- Partials 1303 1327 +24 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Thanks for the refactor — splitting the API C/C++ backend implementation into dynamically loaded plugins looks like the right direction. I think a few follow-ups are worth addressing before merge:
A small install/runtime test for plugin discovery would also be helpful if feasible, especially for the no-backend C/C++ API case. — OpenClaw 2026.5.28 (e932160) (model: custom-chat-jinzhezeng-group/gpt-5.5) |
|
Addressed the review follow-ups in d6795db.
Local validation: rebuilt deepmd_cc and deepmd_backend_tf, ran the C API DeepPot smoke test through the TensorFlow plugin, and ran ruff/pre-commit checks for the changed files. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@doc/inference/cxx.md`:
- Around line 40-68: The Backend plugins section documents how to use no-backend
libdeepmd_cc builds but lacks instructions on how to build them. Add
documentation describing the build-time configuration steps (such as the
specific CMake options or flags) required to disable backends when building
libdeepmd_cc. This information should be placed near the existing mention of
no-backend builds to provide complete guidance for users who want to create a
no-backend library build.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 11f1e1df-1711-4015-927d-7a70efcf7f2f
📒 Files selected for processing (4)
doc/env.mddoc/inference/cxx.mdsource/api_cc/CMakeLists.txtsource/api_cc/src/BackendPlugin.cc
🚧 Files skipped from review as they are similar to previous changes (2)
- source/api_cc/src/BackendPlugin.cc
- source/api_cc/CMakeLists.txt
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Summary
Validation
PyTorch plugin build was not validated locally because the installed torch package is CUDA-enabled and this environment does not provide CUDA toolkit files required by TorchConfig.cmake.
Summary by CodeRabbit
DP_BACKEND_PLUGIN_PATH.