From a9dfc01a846a984972c0ed5c474faa978add0cbd Mon Sep 17 00:00:00 2001
From: "njzjz-bot (driven by OpenClaw (model:
 custom-chat-jinzhezeng-group/gpt-5.5))[bot]"
 <48687836+njzjz-bot@users.noreply.github.com>
Date: Sat, 13 Jun 2026 17:57:19 +0000
Subject: [PATCH] docs(skills): consolidate training skill disclosure

Move model-specific DeePMD-kit training recipes under a single deepmd-train skill so agents first choose a model and only then read the selected configuration reference. Document the progressive-disclosure pattern for future skill additions.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
---
 doc/agent-skills.md                   |  20 +-
 skills/deepmd-train-dpa3/SKILL.md     | 332 --------------------------
 skills/deepmd-train-se-e2-a/SKILL.md  | 271 ---------------------
 skills/deepmd-train/SKILL.md          | 130 ++++++++++
 skills/deepmd-train/models/dpa3.md    | 161 +++++++++++++
 skills/deepmd-train/models/se-e2-a.md | 119 +++++++++
 6 files changed, 420 insertions(+), 613 deletions(-)
 delete mode 100644 skills/deepmd-train-dpa3/SKILL.md
 delete mode 100644 skills/deepmd-train-se-e2-a/SKILL.md
 create mode 100644 skills/deepmd-train/SKILL.md
 create mode 100644 skills/deepmd-train/models/dpa3.md
 create mode 100644 skills/deepmd-train/models/se-e2-a.md

diff --git a/doc/agent-skills.md b/doc/agent-skills.md
index a78631d445..54abd87d2e 100644
--- a/doc/agent-skills.md
+++ b/doc/agent-skills.md
@@ -3,8 +3,9 @@
 DeePMD-kit provides official [Agent Skills](https://agentskills.io/what-are-skills) that help AI agents run
 DeePMD-kit workflows in a reproducible way. These skills capture
 project-specific operating knowledge—such as training inputs, model
-deployment, LAMMPS integration, and Python inference patterns—so an agent can
-turn a high-level request into concrete files, commands, and validation steps.
+selection, deployment, LAMMPS integration, and Python inference patterns—so an
+agent can turn a high-level request into concrete files, commands, and
+validation steps.
 
 The DeePMD-kit skills were initially developed in the
 [Computational Chemistry Agent Skills](https://github.com/jinzhezenggroup/computational-chemistry-agent-skills)
@@ -13,14 +14,13 @@ in the DeePMD-kit repository under `skills/`.
 
 ## List of skills
 
-- `deepmd-train-dpa3`: Train DeePMD-kit models with the DPA3 descriptor and the
-  PyTorch backend, including input generation, neighbor-selection choices,
-  training, freezing, and testing.
+- `deepmd-train`: Choose a DeePMD-kit model family, then train from scratch.
+  The skill uses progressive disclosure: the top-level workflow handles common
+  training steps and model selection, while model-specific configuration lives
+  under `skills/deepmd-train/models/` and is read only after a model is chosen.
+  Current references include DPA3 and se_e2_a.
 - `deepmd-finetune-dpa3`: Fine-tune DPA3 models from self-trained checkpoints,
   multi-task pretrained models, or built-in models downloaded by `dp pretrained download`.
-- `deepmd-train-se-e2-a`: Train classical Deep Potential models with the
-  `se_e2_a` descriptor, including preparation of training JSON files and
-  post-training validation.
 - `deepmd-python-inference`: Run Python and CLI inference with trained or
   frozen DeePMD-kit models, including energy, force, virial, descriptor, and
   model-deviation workflows.
@@ -76,7 +76,7 @@ without launching an expensive calculation. For example:
 
 - “Use the `deepmd-python-inference` skill to write a minimal Python snippet
   for loading a frozen DeePMD-kit model and evaluating one frame.”
-- “Use the `deepmd-train-dpa3` skill to draft a small DPA3 training input for a
-  water dataset, but do not start training.”
+- “Use the `deepmd-train` skill to choose between DPA3 and se_e2_a for a small
+  water dataset and draft a training input, but do not start training.”
 - “Use the `lammps-deepmd` skill to prepare an NVT LAMMPS input file for a
   DeePMD-kit model, and explain each command.”
diff --git a/skills/deepmd-train-dpa3/SKILL.md b/skills/deepmd-train-dpa3/SKILL.md
deleted file mode 100644
index 9c361e041a..0000000000
--- a/skills/deepmd-train-dpa3/SKILL.md
+++ /dev/null
@@ -1,332 +0,0 @@
----
-name: deepmd-train-dpa3
-description: Train a DeePMD-kit model using the DPA3 descriptor with the PyTorch backend. Use when the user wants to train a state-of-the-art deep potential model based on message passing on Line Graph Series (LiGS). DPA3 provides high accuracy and strong generalization, suitable for large atomic models (LAM) and diverse chemical systems. Supports both fixed and dynamic neighbor selection.
-compatibility: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. Custom OP library required for LAMMPS deployment.
-license: LGPL-3.0-or-later
-metadata:
-  author: iProzd
-  version: '1.0'
-  repository: https://github.com/deepmodeling/deepmd-kit
----
-
-# DeePMD-kit Training: DPA3
-
-Train a deep potential model using the DPA3 descriptor, an advanced message-passing architecture operating on Line Graph Series (LiGS). DPA3 is designed as a large atomic model (LAM) with high fitting accuracy and robust generalization across diverse chemical and materials systems.
-
-## Quick Start
-
-```bash
-dp --pt train input.json
-```
-
-## Agent Responsibilities
-
-1. Confirm the user has a working deepmd-kit environment with PyTorch backend.
-1. Collect the minimum required information:
-   - Training data paths (deepmd/npy or deepmd/hdf5 format)
-   - Validation data paths
-   - Element types (type_map)
-   - Target number of training steps
-   - Model size preference (L3/L6/L12 layers)
-1. Generate a complete `input.json` training configuration.
-1. Decide whether to use fixed or dynamic neighbor selection based on system diversity.
-1. Run training and monitor the learning curve.
-1. Freeze the trained model and optionally test it.
-
-## Workflow
-
-### Step 1: Prepare Training Data
-
-Same format as other DeePMD models. Each system directory should contain:
-
-```
-system_dir/
-├── type.raw
-├── type_map.raw
-└── set.000/
-    ├── coord.npy
-    ├── energy.npy
-    ├── force.npy
-    ├── box.npy
-    └── virial.npy
-```
-
-DPA3 also supports the mixed type data format for multi-element systems.
-
-### Step 2: Write input.json
-
-#### Standard DPA3 (fixed selection)
-
-```json
-{
-  "model": {
-    "type_map": [
-      "O",
-      "H"
-    ],
-    "descriptor": {
-      "type": "dpa3",
-      "repflow": {
-        "n_dim": 128,
-        "e_dim": 64,
-        "a_dim": 32,
-        "nlayers": 6,
-        "e_rcut": 6.0,
-        "e_rcut_smth": 5.3,
-        "e_sel": 120,
-        "a_rcut": 4.0,
-        "a_rcut_smth": 3.5,
-        "a_sel": 30,
-        "axis_neuron": 4,
-        "fix_stat_std": 0.3,
-        "a_compress_rate": 1,
-        "a_compress_e_rate": 2,
-        "a_compress_use_split": true,
-        "update_angle": true,
-        "smooth_edge_update": true,
-        "edge_init_use_dist": true,
-        "use_exp_switch": true,
-        "update_style": "res_residual",
-        "update_residual": 0.1,
-        "update_residual_init": "const"
-      },
-      "activation_function": "silut:10.0",
-      "use_tebd_bias": false,
-      "precision": "float32",
-      "concat_output_tebd": false,
-      "seed": 1
-    },
-    "fitting_net": {
-      "neuron": [
-        240,
-        240,
-        240
-      ],
-      "resnet_dt": true,
-      "precision": "float32",
-      "activation_function": "silut:10.0",
-      "seed": 1
-    }
-  },
-  "learning_rate": {
-    "type": "exp",
-    "decay_steps": 5000,
-    "start_lr": 0.001,
-    "stop_lr": 3e-05
-  },
-  "loss": {
-    "type": "ener",
-    "start_pref_e": 0.2,
-    "limit_pref_e": 20,
-    "start_pref_f": 100,
-    "limit_pref_f": 60,
-    "start_pref_v": 0.02,
-    "limit_pref_v": 1
-  },
-  "optimizer": {
-    "type": "AdamW",
-    "adam_beta1": 0.9,
-    "adam_beta2": 0.999,
-    "weight_decay": 0.001
-  },
-  "training": {
-    "stat_file": "./dpa3.hdf5",
-    "training_data": {
-      "systems": [
-        "./data/train_0",
-        "./data/train_1",
-        "./data/train_2"
-      ],
-      "batch_size": 1
-    },
-    "validation_data": {
-      "systems": [
-        "./data/valid_0"
-      ],
-      "batch_size": 1
-    },
-    "numb_steps": 1000000,
-    "gradient_max_norm": 5.0,
-    "seed": 10,
-    "disp_file": "lcurve.out",
-    "disp_freq": 100,
-    "save_freq": 2000
-  }
-}
-```
-
-If you do not want to train on virial, set the virial prefactors to 0.
-
-DPA3 uses different default loss prefactors compared to SE_E2_A. See the comparison table in the "Key Differences from SE_E2_A" section below.
-
-The meaning of each parameter can be generated through `dp doc-train-input`.
-Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter:
-
-```sh
-dp doc-train-input | grep -A 7 training/numb_steps
-dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[dpa3\]/repflow/e_sel'
-```
-
-#### DPA3 with Dynamic Selection
-
-For systems with highly variable neighbor counts (e.g., multi-element datasets), use dynamic selection by modifying the `repflow` section:
-
-```json
-"repflow": {
-  "e_sel": 1200,
-  "a_sel": 300,
-  "use_dynamic_sel": true,
-  "sel_reduce_factor": 10.0
-}
-```
-
-When `use_dynamic_sel` is true, the effective selection is `e_sel / sel_reduce_factor` and `a_sel / sel_reduce_factor` (i.e., 120 and 30 in this example), but the model dynamically adapts to varying neighbor counts.
-
-### Step 3: Run Training
-
-```bash
-dp --pt train input.json
-```
-
-To restart from a checkpoint:
-
-```bash
-dp --pt train input.json --restart model.ckpt.pt
-```
-
-### Step 4: Monitor Training
-
-The learning curve is written to `lcurve.out` with columns:
-
-```
-#  step  rmse_val  rmse_trn  rmse_e_val  rmse_e_trn  rmse_f_val  rmse_f_trn  rmse_v_val  rmse_v_trn  lr
-```
-
-- `rmse_e_*`: energy RMSE per atom (eV/atom)
-- `rmse_f_*`: force RMSE (eV/A)
-- `rmse_v_*`: virial RMSE (eV/atom, only present if virial data is available)
-- `lr`: current learning rate
-
-### Step 5: Freeze the Model
-
-```bash
-dp --pt freeze -o model.pth
-```
-
-### Step 6: Test the Model
-
-```bash
-dp --pt test -m model.pth -s /path/to/test_system -n 30
-```
-
-## Model Size Guide
-
-Choose the number of layers based on accuracy vs. cost trade-off:
-
-| Model         | nlayers | n_dim | e_dim | a_dim | Relative Cost | Use Case                           |
-| ------------- | ------- | ----- | ----- | ----- | ------------- | ---------------------------------- |
-| DPA3-L3       | 3       | 256   | 128   | 32    | 1x            | Quick prototyping, smaller systems |
-| DPA3-L3-small | 3       | 128   | 64    | 32    | 0.8x          | Fast iteration, limited GPU memory |
-| DPA3-L6       | 6       | 256   | 128   | 32    | 2x            | Recommended for production         |
-| DPA3-L6-small | 6       | 128   | 64    | 32    | 1.4x          | Good accuracy/cost balance         |
-
-Benchmark RMSE (averaged over 6 representative systems, 0.5M steps):
-
-| Model                     | Energy (meV/atom) | Force (meV/A) | Virial (meV/atom) |
-| ------------------------- | ----------------- | ------------- | ----------------- |
-| DPA3-L3 (256/128/32)      | 5.74              | 85.4          | 43.1              |
-| DPA3-L3-small (128/64/32) | 6.99              | 93.6          | 46.7              |
-| DPA3-L6 (256/128/32)      | 4.85              | 79.9          | 39.7              |
-| DPA3-L6-small (128/64/32) | 5.11              | 77.7          | 41.2              |
-| DPA2-L6 (reference)       | 12.12             | 109.3         | 83.1              |
-
-## Key Differences from SE_E2_A
-
-| Aspect            | SE_E2_A              | DPA3                            |
-| ----------------- | -------------------- | ------------------------------- |
-| Architecture      | Two-body embedding   | Message passing on LiGS         |
-| Default precision | float64              | float32                         |
-| Optimizer         | Adam                 | AdamW (with weight_decay)       |
-| Loss prefactors   | e: 0.02→1, f: 1000→1 | e: 0.2→20, f: 100→60, v: 0.02→1 |
-| stop_lr           | 3.51e-8              | 3e-5                            |
-| Gradient clipping | Not used             | gradient_max_norm: 5.0          |
-| Virial training   | Optional             | Recommended                     |
-| Model compression | Supported            | Not supported                   |
-| Activation        | tanh (default)       | silut:10.0                      |
-
-## Key Hyperparameters
-
-### Repflow (Descriptor)
-
-| Parameter         | Description                      | Default        |
-| ----------------- | -------------------------------- | -------------- |
-| `n_dim`           | Node embedding dimension         | 128 or 256     |
-| `e_dim`           | Edge embedding dimension         | 64 or 128      |
-| `a_dim`           | Angle embedding dimension        | 32             |
-| `nlayers`         | Number of message passing layers | 3 or 6         |
-| `e_rcut`          | Edge cutoff radius (A)           | 6.0            |
-| `e_rcut_smth`     | Edge smooth cutoff start         | 5.3            |
-| `e_sel`           | Max edge neighbors               | 120            |
-| `a_rcut`          | Angle cutoff radius (A)          | 4.0            |
-| `a_rcut_smth`     | Angle smooth cutoff start        | 3.5            |
-| `a_sel`           | Max angle neighbors              | 30             |
-| `update_style`    | Residual update style            | "res_residual" |
-| `update_residual` | Residual scaling factor          | 0.1            |
-
-### Activation Function
-
-DPA3 uses `silut:10.0` by default. For datasets where training is unstable, consider switching to `tanh`:
-
-```json
-"descriptor": {
-  "type": "dpa3",
-  "repflow": { ... },
-  "activation_function": "tanh"
-},
-"fitting_net": {
-  "activation_function": "tanh"
-}
-```
-
-### Optimizer
-
-DPA3 uses AdamW by default (decoupled weight decay):
-
-```json
-"optimizer": {
-  "type": "AdamW",
-  "adam_beta1": 0.9,
-  "adam_beta2": 0.999,
-  "weight_decay": 0.001
-}
-```
-
-### Gradient Clipping
-
-Recommended for DPA3 to stabilize training:
-
-```json
-"training": {
-  "gradient_max_norm": 5.0
-}
-```
-
-## Agent Checklist
-
-- [ ] Training data exists and is in deepmd format
-- [ ] `type_map` matches the elements in the data
-- [ ] Precision is set to `float32` (DPA3 default, not float64)
-- [ ] AdamW optimizer is configured with weight_decay
-- [ ] `gradient_max_norm` is set (recommended: 5.0)
-- [ ] `stop_lr` is 3e-5 (not 3.51e-8 as in SE_E2_A)
-- [ ] Virial loss prefactors are included if virial data is available
-- [ ] `stat_file` is set to cache statistics (avoids recomputation on restart)
-- [ ] Training completes without NaN in `lcurve.out`
-- [ ] Model is frozen to `.pth` after training
-
-## References
-
-- [DPA3 descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/dpa3.html)
-- [DPA3 paper](https://arxiv.org/abs/2506.01686)
-- [Training documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html)
-- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit)
diff --git a/skills/deepmd-train-se-e2-a/SKILL.md b/skills/deepmd-train-se-e2-a/SKILL.md
deleted file mode 100644
index 11bcc992bf..0000000000
--- a/skills/deepmd-train-se-e2-a/SKILL.md
+++ /dev/null
@@ -1,271 +0,0 @@
----
-name: deepmd-train-se-e2-a
-description: Train a DeePMD-kit model using the SE_E2_A (DeepPot-SE) descriptor with the PyTorch backend. Use when the user wants to train a classical deep potential model for a specific system, prepare training input JSON, run `dp --pt train`, monitor learning curves, freeze the model, and test it. SE_E2_A is the foundational two-body embedding descriptor suitable for most condensed-phase systems.
-compatibility: Requires deepmd-kit with PyTorch backend installed. GPU recommended for production training.
-license: LGPL-3.0-or-later
-metadata:
-  author: iProzd
-  version: '1.0'
-  repository: https://github.com/deepmodeling/deepmd-kit
----
-
-# DeePMD-kit Training: SE_E2_A
-
-Train a deep potential model using the SE_E2_A (Smooth Edition, two-body embedding, all information) descriptor. This is the foundational DeepPot-SE architecture suitable for most condensed-phase systems.
-
-## Quick Start
-
-```bash
-dp --pt train input.json
-```
-
-## Agent Responsibilities
-
-1. Confirm the user has a working deepmd-kit environment with PyTorch backend.
-1. Collect the minimum required information:
-   - Training data paths (deepmd/npy or deepmd/hdf5 format)
-   - Validation data paths
-   - Element types (type_map)
-   - Target number of training steps
-1. Generate a complete `input.json` training configuration.
-1. Explain key hyperparameters if the user is unfamiliar.
-1. Run training and monitor the learning curve (`lcurve.out`).
-1. Freeze the trained model to `.pth` format.
-1. Optionally test the model with `dp test`.
-
-## Workflow
-
-### Step 1: Prepare Training Data
-
-Training data must be in DeePMD format (deepmd/npy or deepmd/hdf5). Each system directory should contain:
-
-```
-system_dir/
-├── type.raw          # atom type indices, one integer per atom
-├── type_map.raw      # element names, one per line
-└── set.000/
-    ├── coord.npy     # coordinates (nframes, natoms*3)
-    ├── energy.npy    # energies (nframes, 1)
-    ├── force.npy     # forces (nframes, natoms*3)
-    └── box.npy       # cell vectors (nframes, 9)
-```
-
-If the user has DFT output (VASP OUTCAR, etc.), refer to the `dpdata-cli` skill for format conversion.
-
-### Step 2: Write input.json
-
-A complete SE_E2_A training configuration:
-
-```json
-{
-  "model": {
-    "type_map": [
-      "O",
-      "H"
-    ],
-    "descriptor": {
-      "type": "se_e2_a",
-      "sel": [
-        46,
-        92
-      ],
-      "rcut_smth": 0.5,
-      "rcut": 6.0,
-      "neuron": [
-        25,
-        50,
-        100
-      ],
-      "resnet_dt": false,
-      "axis_neuron": 16,
-      "type_one_side": true,
-      "seed": 1
-    },
-    "fitting_net": {
-      "neuron": [
-        240,
-        240,
-        240
-      ],
-      "resnet_dt": true,
-      "seed": 1
-    }
-  },
-  "learning_rate": {
-    "type": "exp",
-    "decay_steps": 5000,
-    "start_lr": 0.001,
-    "stop_lr": 3.51e-08
-  },
-  "loss": {
-    "type": "ener",
-    "start_pref_e": 0.02,
-    "limit_pref_e": 1,
-    "start_pref_f": 1000,
-    "limit_pref_f": 1,
-    "start_pref_v": 0.02,
-    "limit_pref_v": 1
-  },
-  "training": {
-    "training_data": {
-      "systems": [
-        "./data/train_system_0",
-        "./data/train_system_1"
-      ],
-      "batch_size": "auto"
-    },
-    "validation_data": {
-      "systems": [
-        "./data/valid_system_0"
-      ],
-      "batch_size": 1,
-      "numb_btch": 3
-    },
-    "numb_steps": 400000,
-    "seed": 10,
-    "disp_file": "lcurve.out",
-    "disp_freq": 100,
-    "save_freq": 10000
-  }
-}
-```
-
-If you do not want to train on virial, set the virial prefactors to 0.
-
-SE_E2_A uses different default loss prefactors compared to DPA3 (e: 0.02→1, f: 1000→1 vs. e: 0.2→20, f: 100→60, v: 0.02→1).
-
-The meaning of each parameter can be generated through `dp doc-train-input`.
-Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter:
-
-```sh
-dp doc-train-input | grep -A 7 training/numb_steps
-dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[se_e2_a\]/sel'
-```
-
-### Step 3: Run Training
-
-```bash
-dp --pt train input.json
-```
-
-To restart from a checkpoint:
-
-```bash
-dp --pt train input.json --restart model.ckpt.pt
-```
-
-To initialize from an existing model:
-
-```bash
-dp --pt train input.json --init-model model.ckpt.pt
-```
-
-### Step 4: Monitor Training
-
-The learning curve is written to `lcurve.out` with columns:
-
-```
-#  step  rmse_val  rmse_trn  rmse_e_val  rmse_e_trn  rmse_f_val  rmse_f_trn  rmse_v_val  rmse_v_trn  lr
-```
-
-- `rmse_e_*`: energy RMSE per atom (eV/atom)
-- `rmse_f_*`: force RMSE (eV/A)
-- `rmse_v_*`: virial RMSE (eV/atom, only present if virial data is available)
-- `lr`: current learning rate
-
-Quick visualization:
-
-```python
-import numpy as np
-import matplotlib.pyplot as plt
-
-data = np.genfromtxt("lcurve.out", names=True)
-for name in data.dtype.names[1:-1]:
-    plt.plot(data["step"], data[name], label=name)
-plt.legend()
-plt.xlabel("Step")
-plt.ylabel("Loss")
-plt.xscale("symlog")
-plt.yscale("log")
-plt.grid()
-plt.show()
-```
-
-### Step 5: Freeze the Model
-
-```bash
-dp --pt freeze -o model.pth
-```
-
-### Step 6: Test the Model
-
-```bash
-dp --pt test -m model.pth -s /path/to/test_system -n 30
-```
-
-## Key Hyperparameters
-
-### Descriptor
-
-| Parameter       | Description                         | Typical Value    |
-| --------------- | ----------------------------------- | ---------------- |
-| `rcut`          | Cutoff radius (A)                   | 6.0              |
-| `rcut_smth`     | Smooth cutoff start (A)             | 0.5              |
-| `sel`           | Max neighbors per type              | System-dependent |
-| `neuron`        | Embedding net sizes                 | [25, 50, 100]    |
-| `axis_neuron`   | Axis matrix dimension               | 16               |
-| `type_one_side` | Share embedding across center types | true             |
-
-### Fitting Net
-
-| Parameter   | Description            | Typical Value   |
-| ----------- | ---------------------- | --------------- |
-| `neuron`    | Hidden layer sizes     | [240, 240, 240] |
-| `resnet_dt` | Use timestep in ResNet | true            |
-
-### Loss Prefactors
-
-| JSON keys                       | Description              | Start | Limit |
-| ------------------------------- | ------------------------ | ----- | ----- |
-| `start_pref_e` / `limit_pref_e` | Energy weight            | 0.02  | 1     |
-| `start_pref_f` / `limit_pref_f` | Force weight             | 1000  | 1     |
-| `start_pref_v` / `limit_pref_v` | Virial weight (optional) | 0.02  | 1     |
-
-Here, `start_pref_*` and `limit_pref_*` set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. For virial, set to 0 if not training on virial data.
-
-### Training
-
-| Parameter     | Description           | Typical Value       |
-| ------------- | --------------------- | ------------------- |
-| `numb_steps`  | Total training steps  | 400000-1000000      |
-| `batch_size`  | Frames per step       | "auto" or "auto:32" |
-| `start_lr`    | Initial learning rate | 0.001               |
-| `stop_lr`     | Final learning rate   | 3.51e-8             |
-| `decay_steps` | LR decay interval     | 5000                |
-
-### Setting `sel`
-
-`sel` is a list with one entry per element type, specifying the maximum number of neighbors of that type within `rcut`. To determine appropriate values:
-
-```bash
-dp --pt neighbor-stat -s /path/to/data -r 6.0 -t O H
-```
-
-Use values slightly above the reported maximum.
-
-## Agent Checklist
-
-- [ ] Training data exists and is in deepmd format
-- [ ] `type_map` matches the elements in the data
-- [ ] `sel` is appropriate for the system (use `dp neighbor-stat` if unsure)
-- [ ] `rcut` is reasonable for the system (typically 6.0-9.0 A)
-- [ ] Training completes without NaN in `lcurve.out`
-- [ ] Model is frozen to `.pth` after training
-- [ ] Test RMSE values are reported to the user
-
-## References
-
-- [SE_E2_A descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/train-se-e2-a.html)
-- [Training documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html)
-- [Training advanced options](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training-advanced.html)
-- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit)
diff --git a/skills/deepmd-train/SKILL.md b/skills/deepmd-train/SKILL.md
new file mode 100644
index 0000000000..d38e7858d7
--- /dev/null
+++ b/skills/deepmd-train/SKILL.md
@@ -0,0 +1,130 @@
+---
+name: deepmd-train
+description: Train DeePMD-kit models with progressive disclosure. Use when the user wants to train a DeePMD-kit potential, prepare an input.json, choose between model families such as se_e2_a/DeepPot-SE and DPA3, run `dp train`, monitor learning curves, freeze checkpoints, or test trained models. Start with model selection and read only the selected model reference under `models/` when model-specific configuration is needed.
+compatibility: Requires deepmd-kit installed. The selected backend and model may require PyTorch, TensorFlow, JAX, Paddle, GPU support, or custom OP libraries.
+license: LGPL-3.0-or-later
+metadata:
+  author: iProzd
+  version: '1.1'
+  repository: https://github.com/deepmodeling/deepmd-kit
+---
+
+# DeePMD-kit Training
+
+Use this skill to guide DeePMD-kit model training without loading every model-specific recipe up front.
+The workflow is intentionally progressive:
+
+1. Understand the user's data, target accuracy, compute budget, and deployment backend.
+1. Choose an appropriate model family.
+1. Read only the reference file for the selected model under [`models/`](models/).
+1. Generate or edit `input.json`, run training, monitor, freeze, and test.
+
+## Progressive disclosure protocol
+
+Do not start by reading every model document. First classify the request:
+
+- If the user already named a model, read only that model reference.
+- If the user asks for a recommendation, collect the decision inputs below, choose a model, then read only the selected reference.
+- If model-specific parameters are not needed yet, stay in this top-level workflow.
+
+Available model references:
+
+| Model reference                          | Read when                                                                                                                                |
+| ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| [`models/se-e2-a.md`](models/se-e2-a.md) | The user wants a classical DeepPot-SE baseline, broad compatibility, or a smaller/established production model.                          |
+| [`models/dpa3.md`](models/dpa3.md)       | The user wants a high-accuracy DPA3/LAM workflow, large/diverse datasets, dynamic neighbor selection, or pretrained DPA3-style training. |
+
+## Model selection
+
+Ask only for missing information that changes the choice. Prefer reasonable defaults when the answer is obvious from context.
+
+Key inputs:
+
+- Data format and size: deepmd/npy, deepmd/hdf5, mixed type, number of systems/frames/elements.
+- Target: quick baseline, production accuracy, large atomic model, transfer/fine-tuning, or deployment in MD.
+- Compute: CPU/GPU, available memory, single-node vs. distributed training.
+- Backend/deployment: PyTorch/TensorFlow/JAX/Paddle training; LAMMPS, Python inference, or other downstream use.
+- Labels: energy/force only or also virial/stress.
+- System diversity: single chemistry/phase vs. diverse multi-domain datasets.
+
+Recommended defaults:
+
+- Choose **se_e2_a** for a robust baseline, small to medium systems, compatibility-focused workflows, or when compute is limited.
+- Choose **DPA3** for high accuracy on diverse datasets, LAM-style training, or when the user explicitly asks for DPA3, DPA-3, LiGS, dynamic neighbor selection, or pretrained DPA3 variants.
+
+## Common workflow
+
+### 1. Confirm environment
+
+```bash
+dp --version
+```
+
+For PyTorch training, use `dp --pt ...`; for TensorFlow, use `dp ...`; for other backends, confirm the installed backend first.
+
+### 2. Confirm training data
+
+Training data should be in DeePMD format, typically deepmd/npy or deepmd/hdf5. If the user has raw electronic-structure outputs, convert them first with dpdata before writing the training input.
+
+Minimum information needed to build `input.json`:
+
+- `type_map`
+- training system paths
+- validation system paths
+- whether virial labels are present and should be trained
+- target number of steps or accuracy/time budget
+- model choice
+
+### 3. Read the selected model reference
+
+After selecting a model, read the corresponding file under [`models/`](models/) and apply its model-specific configuration, hyperparameters, and caveats.
+
+### 4. Train
+
+```bash
+dp --pt train input.json
+```
+
+Use the backend-specific command if not using PyTorch.
+
+Restart from a checkpoint when needed:
+
+```bash
+dp --pt train input.json --restart model.ckpt.pt
+```
+
+### 5. Monitor
+
+Training progress is usually written to `lcurve.out`. Check for:
+
+- decreasing validation RMSE
+- NaN or exploding losses
+- train/validation divergence
+- learning-rate schedule behaving as expected
+
+### 6. Freeze and test
+
+```bash
+dp --pt freeze -o model.pth
+dp --pt test -m model.pth -s /path/to/test_system -n 30
+```
+
+Adjust the backend flags and output extension for non-PyTorch models.
+
+## Agent checklist
+
+- [ ] Model was selected before reading model-specific details.
+- [ ] Only the selected model reference was loaded.
+- [ ] Training/validation data paths exist or are clearly marked as placeholders.
+- [ ] `type_map` matches the data and model/pretrained checkpoint.
+- [ ] Virial loss is enabled only when virial labels are available and desired.
+- [ ] Backend command matches the selected model and installed DeePMD-kit environment.
+- [ ] The generated `input.json` is valid JSON.
+- [ ] Training was monitored via `lcurve.out` or equivalent logs.
+- [ ] Final model was frozen and tested when requested.
+
+## References
+
+- [Training documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html)
+- [Training input documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/train-input.html)
+- [Model documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/index.html)
diff --git a/skills/deepmd-train/models/dpa3.md b/skills/deepmd-train/models/dpa3.md
new file mode 100644
index 0000000000..577dc308aa
--- /dev/null
+++ b/skills/deepmd-train/models/dpa3.md
@@ -0,0 +1,161 @@
+# DPA3 training reference
+
+Read this file only after the user chooses DPA3, or when DPA3 is the best fit for the task. Keep the shared data checks, train/monitor/freeze/test workflow in `../SKILL.md`; this file only records DPA3-specific choices.
+
+## When to choose DPA3
+
+Use DPA3 for high-accuracy training on diverse systems, large atomic model workflows, DPA-3/LiGS requests, dynamic neighbor selection, or pretrained DPA3-style experiments. Assume PyTorch unless the user provides a different supported backend.
+
+## Extra inputs to collect
+
+- Model size preference: L3/L6 and small/full dimensions.
+- Whether neighbor counts vary strongly across systems, which decides fixed vs. dynamic selection.
+- Whether virial labels are available. Virial training is recommended when present.
+- GPU memory budget, because DPA3 is usually heavier than se_e2_a.
+
+## Model-specific JSON sections
+
+Merge these sections into the common training input described by `../SKILL.md`.
+
+```json
+{
+  "model": {
+    "type_map": [
+      "O",
+      "H"
+    ],
+    "descriptor": {
+      "type": "dpa3",
+      "repflow": {
+        "n_dim": 128,
+        "e_dim": 64,
+        "a_dim": 32,
+        "nlayers": 6,
+        "e_rcut": 6.0,
+        "e_rcut_smth": 5.3,
+        "e_sel": 120,
+        "a_rcut": 4.0,
+        "a_rcut_smth": 3.5,
+        "a_sel": 30,
+        "axis_neuron": 4,
+        "fix_stat_std": 0.3,
+        "a_compress_rate": 1,
+        "a_compress_e_rate": 2,
+        "a_compress_use_split": true,
+        "update_angle": true,
+        "smooth_edge_update": true,
+        "edge_init_use_dist": true,
+        "use_exp_switch": true,
+        "update_style": "res_residual",
+        "update_residual": 0.1,
+        "update_residual_init": "const"
+      },
+      "activation_function": "silut:10.0",
+      "use_tebd_bias": false,
+      "precision": "float32",
+      "concat_output_tebd": false,
+      "seed": 1
+    },
+    "fitting_net": {
+      "neuron": [
+        240,
+        240,
+        240
+      ],
+      "resnet_dt": true,
+      "precision": "float32",
+      "activation_function": "silut:10.0",
+      "seed": 1
+    }
+  },
+  "learning_rate": {
+    "type": "exp",
+    "decay_steps": 5000,
+    "start_lr": 0.001,
+    "stop_lr": 3e-05
+  },
+  "loss": {
+    "type": "ener",
+    "start_pref_e": 0.2,
+    "limit_pref_e": 20,
+    "start_pref_f": 100,
+    "limit_pref_f": 60,
+    "start_pref_v": 0.02,
+    "limit_pref_v": 1
+  },
+  "optimizer": {
+    "type": "AdamW",
+    "adam_beta1": 0.9,
+    "adam_beta2": 0.999,
+    "weight_decay": 0.001
+  },
+  "training": {
+    "stat_file": "./dpa3.hdf5",
+    "gradient_max_norm": 5.0
+  }
+}
+```
+
+Set the virial prefactors to `0` when virial labels are unavailable or should not be trained.
+
+## Dynamic selection
+
+For systems with highly variable neighbor counts, use dynamic selection in `repflow`:
+
+```json
+{
+  "e_sel": 1200,
+  "a_sel": 300,
+  "use_dynamic_sel": true,
+  "sel_reduce_factor": 10.0
+}
+```
+
+With `use_dynamic_sel: true`, the effective selection is `e_sel / sel_reduce_factor` and `a_sel / sel_reduce_factor` (120 and 30 above), while the model adapts to varying neighbor counts.
+
+## Model size guide
+
+| Model         | `nlayers` | `n_dim` | `e_dim` | `a_dim` | Relative cost | Use case                           |
+| ------------- | --------- | ------- | ------- | ------- | ------------- | ---------------------------------- |
+| DPA3-L3       | 3         | 256     | 128     | 32      | 1x            | Quick prototyping, smaller systems |
+| DPA3-L3-small | 3         | 128     | 64      | 32      | 0.8x          | Fast iteration, limited GPU memory |
+| DPA3-L6       | 6         | 256     | 128     | 32      | 2x            | Recommended for production         |
+| DPA3-L6-small | 6         | 128     | 64      | 32      | 1.4x          | Good accuracy/cost balance         |
+
+Benchmark RMSE averaged over 6 representative systems at 0.5M steps:
+
+| Model                     | Energy (meV/atom) | Force (meV/A) | Virial (meV/atom) |
+| ------------------------- | ----------------- | ------------- | ----------------- |
+| DPA3-L3 (256/128/32)      | 5.74              | 85.4          | 43.1              |
+| DPA3-L3-small (128/64/32) | 6.99              | 93.6          | 46.7              |
+| DPA3-L6 (256/128/32)      | 4.85              | 79.9          | 39.7              |
+| DPA3-L6-small (128/64/32) | 5.11              | 77.7          | 41.2              |
+| DPA2-L6 (reference)       | 12.12             | 109.3         | 83.1              |
+
+## Key differences from se_e2_a
+
+| Aspect            | se_e2_a              | DPA3                            |
+| ----------------- | -------------------- | ------------------------------- |
+| Architecture      | Two-body embedding   | Message passing on LiGS         |
+| Default precision | `float64`            | `float32`                       |
+| Optimizer         | Adam                 | AdamW with `weight_decay`       |
+| Loss prefactors   | e: 0.02→1, f: 1000→1 | e: 0.2→20, f: 100→60, v: 0.02→1 |
+| `stop_lr`         | 3.51e-8              | 3e-5                            |
+| Gradient clipping | Usually not used     | `gradient_max_norm: 5.0`        |
+| Virial training   | Optional             | Recommended                     |
+| Model compression | Supported            | Not supported                   |
+| Activation        | `tanh` default       | `silut:10.0`                    |
+
+## DPA3 checklist
+
+- [ ] Precision is `float32` for descriptor and fitting net.
+- [ ] AdamW is configured with `weight_decay`.
+- [ ] `gradient_max_norm` is set, typically 5.0.
+- [ ] `stop_lr` is 3e-5, not the se_e2_a value.
+- [ ] Dynamic selection is enabled only when variable neighbor counts justify it.
+- [ ] `stat_file` is set to cache statistics for restart.
+
+## References
+
+- [DPA3 descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/dpa3.html)
+- [DPA3 paper](https://arxiv.org/abs/2506.01686)
diff --git a/skills/deepmd-train/models/se-e2-a.md b/skills/deepmd-train/models/se-e2-a.md
new file mode 100644
index 0000000000..6cb8a44abb
--- /dev/null
+++ b/skills/deepmd-train/models/se-e2-a.md
@@ -0,0 +1,119 @@
+# se_e2_a training reference
+
+Read this file only after the user chooses se_e2_a/DeepPot-SE, or when this model is the best fit for the task. Keep the shared data checks, train/monitor/freeze/test workflow in `../SKILL.md`; this file only records se_e2_a-specific choices.
+
+## When to choose se_e2_a
+
+Use se_e2_a for a robust DeepPot-SE baseline, broad compatibility, limited compute, or small-to-medium systems where a mature production model is preferred over a larger attention/message-passing model.
+
+## Extra inputs to collect
+
+- Cutoff radius, usually `rcut: 6.0` A unless the chemistry requires longer interactions.
+- Neighbor selection `sel`, one value per element type. If unsure, run:
+
+```bash
+dp --pt neighbor-stat -s /path/to/data -r 6.0 -t O H
+```
+
+Use values slightly above the reported maximum.
+
+## Model-specific JSON sections
+
+Merge these sections into the common training input described by `../SKILL.md`.
+
+```json
+{
+  "model": {
+    "type_map": [
+      "O",
+      "H"
+    ],
+    "descriptor": {
+      "type": "se_e2_a",
+      "sel": [
+        46,
+        92
+      ],
+      "rcut_smth": 0.5,
+      "rcut": 6.0,
+      "neuron": [
+        25,
+        50,
+        100
+      ],
+      "resnet_dt": false,
+      "axis_neuron": 16,
+      "type_one_side": true,
+      "seed": 1
+    },
+    "fitting_net": {
+      "neuron": [
+        240,
+        240,
+        240
+      ],
+      "resnet_dt": true,
+      "seed": 1
+    }
+  },
+  "learning_rate": {
+    "type": "exp",
+    "decay_steps": 5000,
+    "start_lr": 0.001,
+    "stop_lr": 3.51e-08
+  },
+  "loss": {
+    "type": "ener",
+    "start_pref_e": 0.02,
+    "limit_pref_e": 1,
+    "start_pref_f": 1000,
+    "limit_pref_f": 1,
+    "start_pref_v": 0.02,
+    "limit_pref_v": 1
+  }
+}
+```
+
+Set the virial prefactors to `0` when virial labels are unavailable or should not be trained.
+
+## Key hyperparameters
+
+### Descriptor
+
+| Parameter       | Description                         | Typical value    |
+| --------------- | ----------------------------------- | ---------------- |
+| `rcut`          | Cutoff radius (A)                   | 6.0              |
+| `rcut_smth`     | Smooth cutoff start (A)             | 0.5              |
+| `sel`           | Max neighbors per type              | System-dependent |
+| `neuron`        | Embedding net sizes                 | `[25, 50, 100]`  |
+| `axis_neuron`   | Axis matrix dimension               | 16               |
+| `type_one_side` | Share embedding across center types | `true`           |
+
+### Fitting and training defaults
+
+| Parameter               | Typical value           | Note                                |
+| ----------------------- | ----------------------- | ----------------------------------- |
+| `fitting_net.neuron`    | `[240, 240, 240]`       | Standard fitting network            |
+| `fitting_net.resnet_dt` | `true`                  | Use timestep in ResNet              |
+| `numb_steps`            | `400000`-`1000000`      | Match data size and accuracy target |
+| `batch_size`            | `"auto"` or `"auto:32"` | Use common training data section    |
+
+### Loss prefactors
+
+| JSON keys                       | Start | Limit | Note                             |
+| ------------------------------- | ----- | ----- | -------------------------------- |
+| `start_pref_e` / `limit_pref_e` | 0.02  | 1     | Energy weight                    |
+| `start_pref_f` / `limit_pref_f` | 1000  | 1     | Force-dominated early training   |
+| `start_pref_v` / `limit_pref_v` | 0.02  | 1     | Use 0 when virial is not trained |
+
+## se_e2_a checklist
+
+- [ ] `sel` has one entry per element in `type_map` and is backed by `neighbor-stat` when unknown.
+- [ ] `rcut` is reasonable for the chemistry, typically 6.0-9.0 A.
+- [ ] Virial prefactors match whether virial labels are present.
+- [ ] Model is frozen and tested using the backend command selected in the top-level workflow.
+
+## References
+
+- [se_e2_a descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/train-se-e2-a.html)
+- [Training advanced options](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training-advanced.html)