Skip to content

feat: parallelize viscosity calculation#388

Open
ltalirz wants to merge 10 commits into
mainfrom
feat/parallelize-visc-2
Open

feat: parallelize viscosity calculation#388
ltalirz wants to merge 10 commits into
mainfrom
feat/parallelize-visc-2

Conversation

@ltalirz

@ltalirz ltalirz commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Run 3 different temperatures in parallel.

@github-actions github-actions Bot added the type: feature Changelog: new feature or performance improvement → bumps minor version label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

@ltalirz ltalirz force-pushed the feat/parallelize-visc-2 branch from 872e464 to a72d00f Compare June 12, 2026 11:15
@ltalirz ltalirz marked this pull request as draft June 12, 2026 12:10
@ltalirz

ltalirz commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@Atilaac one question: in the current "master" workflow, we run the viscosity after the melt quench.
I guess that doesn't really make sense and we should start them in parallel to the melt quench that goes to room temperature (from the same random input structure), right?

@ltalirz ltalirz added the integration Tag a PR with this to run integration tests label Jun 12, 2026
@Atilaac

Atilaac commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Since we do liquid viscosity, you can run them in parallel. Just start from the same random structure and cool it using the same cooling rate to the temperature of interest, so you can claim that this viscosity is from the same trajectory as the glass.
You only need to wait for the glass to be made for the properties that require a glass structure, such as CTE and Elastic moduli

Squashed: parallelize viscosity, import cleanup, workflow special-case, test fixes.
@ltalirz ltalirz force-pushed the feat/parallelize-visc-2 branch from bf2e6bf to aca1283 Compare July 1, 2026 09:18
Viscosity tasks now start from the freshly generated random structure and depend only on structure_generation, so they fan out in parallel with the main melt-quench instead of waiting for it. Each task still performs its own melt-quench cooling to its target temperature.
@ltalirz ltalirz force-pushed the feat/parallelize-visc-2 branch from a895385 to 6657c2f Compare July 1, 2026 09:30
@ltalirz ltalirz marked this pull request as ready for review July 1, 2026 10:43
@ltalirz

ltalirz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@Atilaac / @Gitdowski I think I'm making a mistake in the LAMMPS settings in the viscosity run here - can you spot it?

"Command 'mpiexec -n 3 --oversubscribe lmp_mpi -in lmp.in' returned non-zero exit status 1.
LAMMPS stdout:
t with -dE/dr.
WARNING:  Should only be flagged at inflection points (src/pair_table.cpp:466)
WARNING: New thermo_style command, previous thermo_modify settings will be lost (src/output.cpp:904)\n\nCITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE\n\nYour simulation uses code contributions which should be cited:\n- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419\nThe log file lists these citations in BibTeX format.\n\nCITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE\n
Generated 0 of 10 mixed pair_coeff terms from geometric mixing rule
WARNING: One or more atoms are time integrated more than once (src/modify.cpp:296)
...

ERROR: Lost atoms: original 6000 current 5142 (src/thermo.cpp:494)
Last command: run 10000

@ltalirz

ltalirz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

The relevant code should be in amorphouspy/src/amorphouspy/pipelines/viscosity.py

@Atilaac

Atilaac commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I will have a look

@Atilaac

Atilaac commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I think I managed to find the problem and a fix. Now, should I push it here or in a separate PR?

@ltalirz

ltalirz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks a lot - feel free to push directly here

@ltalirz

ltalirz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Just checking whether you forgot to push or whether you found something else to check

Previously a failed pipeline blanket-marked every step 'failed' and stored
the error under a generic 'pipeline' key. Now the failed branch probes the
per-step executorlib caches to attribute the failure to the step that
actually raised, falling back to the old behaviour when it cannot be
localised.
@Atilaac

Atilaac commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I did not forget to push, I'm still working on it.

fix: stop dangling SHIK melt block from corrupting downstream MD stages

_viscosity_simulation, elastic_simulation, and md_simulation each stripped
the melt pre-equilibration block using a substring pattern from the original block ("fix langevin ... 5000 5000"). The fixes were renamed before to langevinnve/ensemblenve and changed the temperature
to 4000 K, but these three strip sites were never updated, so a dangling Langevin thermostat + nve/limit integrator survived into every NPT/NVT stage, fighting the real ensemble and exploding the simulation.

Consolidate the melt block into a single source of truth
(lammps/potentials/_melt_block.py: melt_block_lines, strip_melt_block, set_melt_block_temperature), used by all six potential generators, the six melt-quench protocols, and the three downstream MD entry points.
Also:

- melt_quench_simulation now retunes the block to the caller's temperature_high instead of the generator's hardcoded default.
- SHIK viscosity equilibration runs at 0.1 GPa (matching the melt-quench protocol convention) instead of 0 GPa.
- melt_quench_simulation raises a clear ValueError when the resolved temperature_high equals temperature_low, instead of silently sending 0 heating/cooling steps to LAMMPS (which failed deep inside with an unrelated "Invalid dump frequency 0" error).

Fixes the "sample explodes" report for SHIK viscosity runs.
@Atilaac

Atilaac commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Can you check now if it is working on your sample? I tried many glasses with Li2O, mainly Li2O:20, SiO2:80 (the one you suggested), and Li2O:15, Na2O:18, SiO2:67. Both the viscosity and elastic moduli workflows are working locally for me.

@ltalirz

ltalirz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Thanks a lot @Atilaac , will do!
I'll also have a close look at your commit to understand what was the problem here

@Atilaac

Atilaac commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Check the commit message; I have a description there. Some other changes are more about reducing code duplication and have nothing to do with the problem directly.

@ltalirz

ltalirz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

btw, I notice you included a pixi.lock update in your commit. I will revert this since it led to problems on my side. Feel free to open a separate PR for this.

ltalirz added 2 commits July 3, 2026 15:47
Caused problems locally
On a partial rerun, executorlib reads a cached parent step's SLURM queue_id
back out of its _o.h5 and injects it as an 'afterok' dependency for the
resubmitted child. That id is long purged from the scheduler, so sbatch
rejects the submission with 'Job dependency problem' (or emits
afterok:None,None,None when no live id exists).

_clear_executor_cache(failed_only=True) now strips the queue_id from every
successful cache file it keeps, so executorlib omits the dependency
entirely (the parent result is already on disk). Adds test_rerun_cache.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration Tag a PR with this to run integration tests type: feature Changelog: new feature or performance improvement → bumps minor version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants