Skip to content

feat: add gstools.mps — Multiple Point Statistics via Direct Sampling#416

Draft
n0228a wants to merge 11 commits into
GeoStat-Framework:mainfrom
n0228a:mps-direct-sampling
Draft

feat: add gstools.mps — Multiple Point Statistics via Direct Sampling#416
n0228a wants to merge 11 commits into
GeoStat-Framework:mainfrom
n0228a:mps-direct-sampling

Conversation

@n0228a

@n0228a n0228a commented Jun 25, 2026

Copy link
Copy Markdown

Summary

Adds a new gstools.mps subpackage implementing Multiple Point Statistics (MPS)
simulation via the Direct Sampling (DS) algorithm (Mariethoz et al. 2010) with
the DSBC parametrization (Juda et al. 2022). MPS is a fundamentally different
paradigm from two-point geostatistics: instead of a parametric covariance model,
spatial structure is learned directly from a training image (TI), enabling
reproduction of curvilinear, connected patterns — channels, fractures, categorical
facies — that variogram-based methods cannot capture.

Supersedes #409.


Public API

Four classes exported from gstools.mps. DirectSampling, MPSModel, and
TrainingImage are also available at the top-level gstools namespace;
Variable is available as gstools.mps.Variable.

Variable

Holds one variable's data and its per-variable configuration:

  • data — n-D NumPy array
  • categorical — whether to use weighted mismatch distance (Mariethoz2010 Eq. 3)
  • distance — continuous metric: "l1" (Juda2022 Eq. 7, default), "l2",
    "l<p>" (arbitrary Minkowski), "variation" or "variation<p>" (Mariethoz2010
    Eq. 9, with automatic mean-shift correction)
  • n_neighbors — data-event size (maximum neighbours, default 32)
  • max_radius — optional Euclidean cap on neighbour selection
  • weight — relative weight in the multivariate joint distance; None for equal
    share; weights are automatically normalized and do not need to sum to 1

TrainingImage

The MPS analogue of CovModel. Three construction modes:

# Univariate — plain array + keyword args
ti = gs.TrainingImage(ti_array, categorical=True, n_neighbors=30)

# Univariate — pre-configured Variable
ti = gs.TrainingImage(gs.mps.Variable("z", ti_array, categorical=True, n_neighbors=30))

# Multivariate — list of Variable objects
ti = gs.TrainingImage([
    gs.mps.Variable("facies",   facies_arr, categorical=True,  weight=0.6, n_neighbors=30),
    gs.mps.Variable("porosity", por_arr,    categorical=False, weight=0.4,
                    distance="l2", n_neighbors=15),
])

Shared option: distance_power — spatial-decay exponent δ for neighbour
weighting (Mariethoz2010 Eq. 5); 0.0 (default) gives uniform weights.

NaN masking — undefined TI cells (NaN) are excluded from distance
comparisons and never pasted into the simulation grid.

MPSModel

Configuration object bundling a TrainingImage with search parameters:
scan_fraction, threshold, cond_weight, boundary.

model = gs.MPSModel(ti, scan_fraction=0.2, threshold=0.0)

DirectSampling

Subclasses gstools.field.base.Field and follows the same call interface as
SRF. Takes an MPSModel.

All search parameters (scan_fraction, threshold, cond_weight, boundary,
n_neighbors, max_radius) are read-only on DirectSampling — they
reflect the underlying MPSModel and Variable objects. n_neighbors and
max_radius collapse to a scalar when all variables share the same value,
otherwise return a {name: value} dict. To change them, mutate the MPSModel
or Variable directly.

Key capabilities:

  • n-dimensional structured grids — fully dimension-agnostic
  • Randomised simulation path (Mariethoz2010 §3 ¶13); also accepts
    "sequential" or an explicit (N, dim) visit-order array via the path
    argument
  • scan_fraction caps the per-node TI scan (Juda2022 §2)
  • threshold=0.0 — DSBC mode: full scan, best-candidate selection
  • boundary="strict" (default) or "partial" — partial drops lags that exceed
    the TI extent and searches with the reduced template (Mariethoz2010 §6.2)
  • set_condition(cond_pos, cond_val) — hard-data conditioning with nearest-node
    snapping and per-variable collision resolution; cond_val accepts a
    {variable: array} dict for multivariate TIs, with np.nan to leave a
    variable unconditioned at a point; conditioning weight is set via MPSModel
  • set_nonstationary() — per-node rotation and anisotropy maps for geometric
    lag transform (Mariethoz2010 §6.2); enables spatially varying orientation
    without modifying the TI
  • set_mv_transforms() — per-variable mean / normalizer / trend post-processing
  • DAG-based parallelism via num_threads (or gstools.config.NUM_THREADS)
  • Dual RNG seeds (path_seed, node_seed) for independent control of visit
    order vs. TI scan randomness
  • Multivariate return value: {variable: ndarray} dict; each variable also
    stored as a named field accessible via ds["facies"] / ds.all_fields

Usage

import numpy as np
import gstools as gs

# --- Univariate categorical, conditioned ---
ti = gs.TrainingImage(ti_array, categorical=True, n_neighbors=30)
ds = gs.DirectSampling(gs.MPSModel(ti, scan_fraction=0.2, threshold=0.0))
ds.set_condition([cond_x, cond_y], cond_val)
field = ds([np.arange(100, dtype=float)] * 2, seed=42)

# --- Multivariate co-simulation ---
ti_mv = gs.TrainingImage([
    gs.mps.Variable("facies",   facies_arr, categorical=True,  weight=0.6, n_neighbors=30),
    gs.mps.Variable("porosity", por_arr,    categorical=False, weight=0.4,
                    distance="l2", n_neighbors=15),
])
ds_mv = gs.DirectSampling(gs.MPSModel(ti_mv, threshold=0.05))
fields = ds_mv([np.arange(80, dtype=float)] * 2, seed=0)
# fields["facies"], fields["porosity"]

Module layout

File Responsibility
training_image.py Variable, TrainingImage — data and distance functions
data_event.py DataEvent — internal value object bundling per-variable data-event arrays
distance.py Stateless vectorised distance helpers
model.py MPSModel — validated search-parameter config
neighbors.py Neighbour selection, geometric lag transform, search-window bounds
scan.py TI scanning loop
runner.py Simulation path and DAG thread executor
simulate.py _DirectSamplingEngine + ds_simulate entry point
direct_sampling.py DirectSampling — public Field subclass

Examples (examples/13_mps/)

File What it shows
00_simple_unconditional.py Minimal unconditional simulation on a synthetic channel TI
01_conditional.py Hard-data conditioning with set_condition()
02_continuous.py Continuous variable simulation
03_channel_strebelle.py Strebelle (2002) fluvial TI, conditional
04_channel_strebelle_nonstat.py Non-stationary simulation with spatially varying rotation
05_channel_strebelle_radial.py Radial non-stationarity with custom spiral path; reproduces Mariethoz2010 Fig. 7
06_channel_multivariate_fig8.py Multivariate co-simulation; reproduces Mariethoz2010 Fig. 8
07_variation_distance_fig6.py Variation-distance metric; reproduces Mariethoz2010 Fig. 6
08_bivariate_joint_fig4.py Bivariate joint simulation; reproduces Mariethoz2010 Fig. 4
09_continuous_stone_fig3.py Continuous simulation on the GAIA stone TI; reproduces Mariethoz2010 Fig. 3
00_mps_overview.ipynb End-to-end notebook covering all modes

References

  • Mariethoz, G., Renard, P., Straubhaar, J. (2010). The Direct Sampling method
    to perform multiple-point geostatistical simulations. Water Resources Research.
    https://doi.org/10.1029/2008WR007621
  • Juda, P., Renard, P., Straubhaar, J. (2022). A parsimonious parametrization of
    the Direct Sampling algorithm for multiple-point statistical simulations.
    Applied Computing and Geosciences.
    https://doi.org/10.1016/j.acags.2022.100091
  • Strebelle, S. (2002). Conditional simulation of complex geological structures
    using multiple-point statistics. Mathematical Geology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant