Skip to content

feat: Persist ExperimentRecord to MinIO and local disk for long-term reproducibility #16

Description

@profsergiocosta

Overview

Currently ExperimentRecord is stored only in Redis, which is ephemeral.
Redis can be flushed, restarted without persistence, or expire TTL-based keys.
When that happens, all provenance data — including the resolved_spec TOML
snapshot, input checksums and output paths — is permanently lost.

This issue tracks adding a second, permanent persistence layer for
ExperimentRecord in both the platform (MinIO) and CLI (local disk).


Problem

Current state:

  job_runner.py  →  executor.save()  →  output.gpkg in MinIO
                                     →  record updated in Redis only

  Redis is ephemeral — record.json is never written to MinIO.
  /reproduce and /publish depend entirely on Redis being alive.

Feature 1 — Platform: persist record.json to MinIO

Files to change

services/worker/job_runner.py

After executor.save(result, record) succeeds, write the record JSON
to MinIO alongside the output:

# After executor.save()
record.add_log(f"Completed — output={record.output_path}")

# Persist record permanently — Redis is ephemeral
_persist_record_to_minio(record)

Add the helper inside job_runner.py:

def _persist_record_to_minio(record) -> None:
    """
    Write ExperimentRecord JSON to MinIO alongside the experiment output.
    This is the permanent source of truth — Redis is ephemeral.
    """
    import io as _io
    from dissmodel.io._storage import get_default_client

    content = record.model_dump_json(indent=2).encode()
    path    = f"experiments/{record.experiment_id}/record.json"

    get_default_client().put_object(
        bucket_name  = "dissmodel-outputs",
        object_name  = path,
        data         = _io.BytesIO(content),
        length       = len(content),
        content_type = "application/json",
    )

Result in MinIO

dissmodel-outputs/
  experiments/
    abc123/
      output.gpkg        ← simulation result
      record.json        ← ExperimentRecord (includes resolved_spec TOML snapshot)
      report.md          ← if CoastalValidationExecutor
      scatter.png        ← if CoastalValidationExecutor

services/api/main.py

Update _load_record to fall back to MinIO when Redis misses:

def _load_record(experiment_id: str) -> ExperimentRecord:
    """Load record from Redis, falling back to MinIO for completed experiments."""
    # Try Redis first (fast, covers running/queued jobs)
    raw = redis_client.get(f"experiment:{experiment_id}")
    if raw:
        return ExperimentRecord.model_validate_json(raw)

    # Fallback: load from MinIO (covers Redis restarts / TTL expiry)
    try:
        obj     = minio_client.get_object(
            "dissmodel-outputs",
            f"experiments/{experiment_id}/record.json"
        )
        content = obj.read().decode()
        return ExperimentRecord.model_validate_json(content)
    except Exception:
        pass

    raise HTTPException(
        status_code=404,
        detail=f"Experiment '{experiment_id}' not found in Redis or MinIO"
    )

Also add a GET /experiments/{id}/record endpoint that returns the full
ExperimentRecord (not just the JobResponse subset):

@app.get("/experiments/{experiment_id}/record", dependencies=AUTH)
async def get_experiment_record(experiment_id: str):
    """Return the full ExperimentRecord including resolved_spec."""
    return _load_record(experiment_id).model_dump()

Acceptance criteria

  • Completed experiment has record.json in MinIO at experiments/{id}/record.json
  • GET /job/{id} works after Redis is flushed (reads from MinIO fallback)
  • POST /experiments/{id}/reproduce works after Redis restart
  • POST /experiments/{id}/publish works after Redis restart
  • GET /experiments/{id}/record returns full ExperimentRecord with resolved_spec

Feature 2 — CLI: persist record.json next to output

File to change

dissmodel/executor/cli.py

After executor.save(), write the record JSON next to the output file:

def _cmd_run(executor_cls, args) -> None:
    record   = _build_record(args)
    executor = executor_cls()

    print("▶ Validating...")
    executor.validate(record)

    print("▶ Running...")
    result = executor.run(record)

    print("▶ Saving...")
    record = executor.save(result, record)

    # Persist record locally alongside output
    record_path = _save_record_locally(record, getattr(args, "output", None))

    print(f"\n✅ Completed")
    print(f"   output:  {record.output_path}")
    print(f"   record:  {record_path}")
    if record.output_sha256:
        print(f"   sha256:  {record.output_sha256[:16]}...")
    for log in record.logs:
        print(f"   {log}")


def _save_record_locally(record, output_path: str | None) -> str:
    """
    Save ExperimentRecord JSON next to the output file.
    Enables local reproducibility without a platform.

    Example:
        output: data/result.tif
        record: data/result.record.json
    """
    from pathlib import Path

    if output_path:
        p    = Path(output_path)
        path = p.with_name(p.stem + ".record.json")
    else:
        path = Path("experiment_record.json")

    path.write_text(record.model_dump_json(indent=2), encoding="utf-8")
    return str(path)

Result on disk

data/
  result.tif              ← simulation output
  result.record.json      ← ExperimentRecord (resolved_spec + checksums)

Acceptance criteria

  • Running CLI produces {stem}.record.json next to the output file
  • If --output is not specified, writes experiment_record.json in current dir
  • record.json includes resolved_spec, source.checksum, output_sha256
  • record.json is valid JSON parseable as ExperimentRecord

Files to change

File Change
services/worker/job_runner.py Add _persist_record_to_minio(), call after executor.save()
services/api/main.py Update _load_record() with MinIO fallback; add GET /experiments/{id}/record
dissmodel/executor/cli.py Add _save_record_locally(), call in _cmd_run() after executor.save()

Out of scope

  • Automatic Redis TTL management (separate ops concern)
  • Record indexing / search beyond experiment_id lookup
  • Zenodo deposit (tracked in separate issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions