COMPRESS

Compression Of Molecular Physical fields into Reduced Spatial Sites

COMPRESS is an optimization framework that maps an all-atom (AA) molecule of M atoms to K physically parameterized sites (K < M):

S ∈ R^{M×6}  →  V ∈ R^{K×6}
  (x, y, z, q, σ, ε)

Each site is defined by three spatial coordinates and three non-bonded interaction parameters - partial charge (q), Lennard-Jones radius (σ), and well depth (ε). The K sites are optimized to reproduce the density, electrostatic, and van der Waals (vdW) fields of the original AA molecule on a 3D face-centered cubic (FCC) grid. This yields a fixed-size, directly physically interpretable molecular representation whose compression level is controlled by K.

Directory Structure

COMPRESS/
├── COMPRESS.py          # Main entry point
├── README.md
├── pyproject.toml       # Package metadata and dependencies
├── example/
│   └── test.smi         # Example input
└── script/
    ├── __init__.py
    ├── extract_params.py    # SMI/PDB → ACPYPE → params CSV
    ├── init.py              # AA and COMPRESS(CG) grid initialization
    ├── grid.py              # Grid class (field computation)
    ├── loss.py              # Loss functions
    ├── update_features.py   # L-BFGS optimization loop
    └── write_file.py        # Save results

Installation

Option 1: pip (recommended)

All dependencies including OpenBabel are installed automatically:

git clone https://github.com/username/COMPRESS.git
cd COMPRESS
pip install -e .

Option 2: pip from GitHub (no clone needed)

# Latest
pip install git+https://github.com/username/COMPRESS.git

# Specific version
pip install git+https://github.com/username/COMPRESS.git@v0.1.0

After installation, compress command is available anywhere:

compress -t smi -n benzene -s 12

Requirements

Python >= 3.10
PyTorch >= 2.0.0
NumPy >= 1.26.0
pandas >= 2.0.0
scikit-learn >= 1.3.0
RDKit >= 2024.03.0
acpype >= 2023.10.27
openbabel-wheel >= 3.1.1

All of the above are installed automatically via pip install.

Usage

compress -t <type> -n <name> -s <n_sites> [options]

Required Arguments

Argument	Description
`-t`, `--type`	Input file type: `smi` or `pdb`
`-n`, `--name`	Molecule name (must match filename, e.g. `benzene` → `benzene.smi`)
`-s`, `--site`	Number of COMPRESS sites

Optional Arguments

Argument	Default	Description
`--steps`	50	Number of optimization steps
`--grid_interval`	0.3	Grid spacing (Å)
`--grid_buffer`	5.0	Grid buffer around molecule (Å)
`--lr_T`	1.0	Langevin temperature
`--lr_noise_scale`	1e-7	Langevin noise scale
`--decay_T`	0.5	Temperature decay factor
`--decay_T_interval`	3	Temperature decay interval (steps)
`--tau_density`	0.2 0.5	Tau values for density field
`--tau_charge`	0.2 1.0	Tau values for charge field
`--tau_epsilon`	0.2 2.0	Tau values for VDW epsilon field

Example

An example input is provided in example/test.smi:

c1ccccc1  benzene

Case 1: SMILES string directly

No input file needed - SMILES is written to test.smi automatically:

compress -t smi -n test -s 12 --smiles "c1ccccc1"

Case 2: SMILES file

Run from the directory containing test.smi:

cd example
compress -t smi -n test -s 12

Case 3: PDB file

If you already have a PDB file, run from the directory containing test.pdb:

compress -t pdb -n test -s 12

You can also use "-s all" option to generate COMPRESS rerpesentations from K=1 to K=M

In all three cases, the pipeline runs automatically:

Generate test.pdb from SMILES via RDKit (Cases 1 & 2 only)
Run ACPYPE → test.acpype/
Extract atomic parameters → test_params.csv
Initialize AA and COMPRESS (CG) grids
Optimize COMPRESS sites via L-BFGS
Save results → test_s12_COMPRESS.pt

If test_params.csv already exists (e.g. rerunning with a different site count), steps 1–3 are skipped automatically:

compress -n test -s 6   # reuses test_params.csv

Expected output:

>> ----------------------------------------
>> Name      : test
>> Input     : /path/to/example/test.smi
>> Sites     : 12
>> Device    : cuda
>> Output    : /path/to/example/test_s12_COMPRESS.pt
>> ----------------------------------------
>> Input file found: test.smi
>> Generating PDB from SMILES: test.smi
>> Saving PDB file: test.pdb
>> Running Acpype...
>> Acpype finished successfully!
>> Extracting params from: test.acpype
>> Params saved: test_params.csv
>> ----------------------------------------
>> Running COMPRESS...
>> ----------------------------------------
>> Optimizing 50 steps...
>> Step    1 | Grid: 0.8241 | Total: 1.2034
>> Step    2 | Grid: 0.7193 | Total: 1.0871
...
>> Step   50 | Grid: 0.1023 | Total: 0.2341
>> Results saved: test_s12_COMPRESS.pt

Output

Results are saved as a PyTorch .pt file ({name}_s{n_sites}_COMPRESS.pt):

import torch

data = torch.load("test_s12_COMPRESS.pt")

data["AA_pos"]   # All-atom positions      (N_aa, 3)
data["AA_chg"]   # All-atom charges        (N_aa,)
data["AA_sig"]   # All-atom sigma          (N_aa,)
data["AA_eps"]   # All-atom epsilon        (N_aa,)

data["pos"]      # COMPRESS site positions       (N_cg, 3)
data["chg"]      # COMPRESS site charges         (N_cg,)
data["sig"]      # COMPRESS site sigma           (N_cg,)
data["eps"]      # COMPRESS site epsilon         (N_cg,)

data["loss"]     # Final loss dict

Pipeline Overview

example/test.smi
   │
   ▼ RDKit (if smi)
example/test.pdb
   │
   ▼ ACPYPE
example/test.acpype/
   │
   ▼ extract_params
example/test_params.csv
   │
   ▼ COMPRESS (init → optimize)
example/test_s12_COMPRESS.pt

Data

The COMPRESS_Drugs-GEOM_K25 dataset is available on Zenodo at:

https://doi.org/10.5281/zenodo.20072228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COMPRESS

Directory Structure

Installation

Option 1: pip (recommended)

Option 2: pip from GitHub (no clone needed)

Requirements

Usage

Required Arguments

Optional Arguments

Example

Case 1: SMILES string directly

Case 2: SMILES file

Case 3: PDB file

Output

Pipeline Overview

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
example		example
script		script
.gitignore		.gitignore
COMPRESS.py		COMPRESS.py
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

COMPRESS

Directory Structure

Installation

Option 1: pip (recommended)

Option 2: pip from GitHub (no clone needed)

Requirements

Usage

Required Arguments

Optional Arguments

Example

Case 1: SMILES string directly

Case 2: SMILES file

Case 3: PDB file

Output

Pipeline Overview

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages