Compression Of Molecular Physical fields into Reduced Spatial Sites
COMPRESS is an optimization framework that maps an all-atom (AA) molecule of M atoms to K physically parameterized sites (K < M):
S ∈ R^{M×6} → V ∈ R^{K×6}
(x, y, z, q, σ, ε)
Each site is defined by three spatial coordinates and three non-bonded interaction parameters - partial charge (q), Lennard-Jones radius (σ), and well depth (ε). The K sites are optimized to reproduce the density, electrostatic, and van der Waals (vdW) fields of the original AA molecule on a 3D face-centered cubic (FCC) grid. This yields a fixed-size, directly physically interpretable molecular representation whose compression level is controlled by K.
COMPRESS/
├── COMPRESS.py # Main entry point
├── README.md
├── pyproject.toml # Package metadata and dependencies
├── example/
│ └── test.smi # Example input
└── script/
├── __init__.py
├── extract_params.py # SMI/PDB → ACPYPE → params CSV
├── init.py # AA and COMPRESS(CG) grid initialization
├── grid.py # Grid class (field computation)
├── loss.py # Loss functions
├── update_features.py # L-BFGS optimization loop
└── write_file.py # Save results
All dependencies including OpenBabel are installed automatically:
git clone https://github.com/username/COMPRESS.git
cd COMPRESS
pip install -e .# Latest
pip install git+https://github.com/username/COMPRESS.git
# Specific version
pip install git+https://github.com/username/COMPRESS.git@v0.1.0After installation, compress command is available anywhere:
compress -t smi -n benzene -s 12- Python >= 3.10
- PyTorch >= 2.0.0
- NumPy >= 1.26.0
- pandas >= 2.0.0
- scikit-learn >= 1.3.0
- RDKit >= 2024.03.0
- acpype >= 2023.10.27
- openbabel-wheel >= 3.1.1
All of the above are installed automatically via pip install.
compress -t <type> -n <name> -s <n_sites> [options]| Argument | Description |
|---|---|
-t, --type |
Input file type: smi or pdb |
-n, --name |
Molecule name (must match filename, e.g. benzene → benzene.smi) |
-s, --site |
Number of COMPRESS sites |
| Argument | Default | Description |
|---|---|---|
--steps |
50 | Number of optimization steps |
--grid_interval |
0.3 | Grid spacing (Å) |
--grid_buffer |
5.0 | Grid buffer around molecule (Å) |
--lr_T |
1.0 | Langevin temperature |
--lr_noise_scale |
1e-7 | Langevin noise scale |
--decay_T |
0.5 | Temperature decay factor |
--decay_T_interval |
3 | Temperature decay interval (steps) |
--tau_density |
0.2 0.5 | Tau values for density field |
--tau_charge |
0.2 1.0 | Tau values for charge field |
--tau_epsilon |
0.2 2.0 | Tau values for VDW epsilon field |
An example input is provided in example/test.smi:
c1ccccc1 benzene
No input file needed - SMILES is written to test.smi automatically:
compress -t smi -n test -s 12 --smiles "c1ccccc1"Run from the directory containing test.smi:
cd example
compress -t smi -n test -s 12If you already have a PDB file, run from the directory containing test.pdb:
compress -t pdb -n test -s 12You can also use "-s all" option to generate COMPRESS rerpesentations from K=1 to K=M
In all three cases, the pipeline runs automatically:
- Generate
test.pdbfrom SMILES via RDKit (Cases 1 & 2 only) - Run ACPYPE →
test.acpype/ - Extract atomic parameters →
test_params.csv - Initialize AA and COMPRESS (CG) grids
- Optimize COMPRESS sites via L-BFGS
- Save results →
test_s12_COMPRESS.pt
If test_params.csv already exists (e.g. rerunning with a different site count), steps 1–3 are skipped automatically:
compress -n test -s 6 # reuses test_params.csvExpected output:
>> ----------------------------------------
>> Name : test
>> Input : /path/to/example/test.smi
>> Sites : 12
>> Device : cuda
>> Output : /path/to/example/test_s12_COMPRESS.pt
>> ----------------------------------------
>> Input file found: test.smi
>> Generating PDB from SMILES: test.smi
>> Saving PDB file: test.pdb
>> Running Acpype...
>> Acpype finished successfully!
>> Extracting params from: test.acpype
>> Params saved: test_params.csv
>> ----------------------------------------
>> Running COMPRESS...
>> ----------------------------------------
>> Optimizing 50 steps...
>> Step 1 | Grid: 0.8241 | Total: 1.2034
>> Step 2 | Grid: 0.7193 | Total: 1.0871
...
>> Step 50 | Grid: 0.1023 | Total: 0.2341
>> Results saved: test_s12_COMPRESS.pt
Results are saved as a PyTorch .pt file ({name}_s{n_sites}_COMPRESS.pt):
import torch
data = torch.load("test_s12_COMPRESS.pt")
data["AA_pos"] # All-atom positions (N_aa, 3)
data["AA_chg"] # All-atom charges (N_aa,)
data["AA_sig"] # All-atom sigma (N_aa,)
data["AA_eps"] # All-atom epsilon (N_aa,)
data["pos"] # COMPRESS site positions (N_cg, 3)
data["chg"] # COMPRESS site charges (N_cg,)
data["sig"] # COMPRESS site sigma (N_cg,)
data["eps"] # COMPRESS site epsilon (N_cg,)
data["loss"] # Final loss dictexample/test.smi
│
▼ RDKit (if smi)
example/test.pdb
│
▼ ACPYPE
example/test.acpype/
│
▼ extract_params
example/test_params.csv
│
▼ COMPRESS (init → optimize)
example/test_s12_COMPRESS.pt
The COMPRESS_Drugs-GEOM_K25 dataset is available on Zenodo at: