Skip to content

feat(storage): diff-layer state storage with bounded pruning#444

Draft
MegaRedHand wants to merge 8 commits into
mainfrom
feat/state-diff-layers
Draft

feat(storage): diff-layer state storage with bounded pruning#444
MegaRedHand wants to merge 8 commits into
mainfrom
feat/state-diff-layers

Conversation

@MegaRedHand

@MegaRedHand MegaRedHand commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replaces aggressive state pruning with a diff-layer storage model so the full state history stays available cheaply, and relaxes block pruning to keep headers/bodies forever while only dropping old finalized signatures.

State storage

  • Every non-genesis state is stored as a parent-linked StateDiff (StateDiffs table, never pruned) plus a full-state snapshot (States) at anchors and hot states.
  • StateDiff stores slot, both checkpoints, and the justification fields in full, plus the appended historical_block_hashes tail. config/validators come from the nearest snapshot (they never change); latest_block_header is read back from BlockHeaders (the stored state caches the real state_root there, so it matches byte-for-byte).
  • get_state returns a snapshot directly, else reconstructs by walking base_root to the nearest ancestor snapshot and replaying appended tails forward.
  • 1024-slot anchors (StateAnchors, permanent) bound the reconstruction walk.
  • Snapshot eviction (prune_old_states) keeps the last SNAPSHOT_HOT_WINDOW = 300 slots + anchors + finalized/justified/head; evicted snapshots leave their diff behind.

Block pruning

  • prune_old_block_signatures(finalized_slot, tip_slot): with cutoff = tip_slot - SIGNATURE_PRUNING_RANGE, prune signatures for slot < cutoff only when cutoff <= finalized_slot (healthy finality); during deep non-finality (non-finalized range > window) prune nothing.
  • BlockHeaders and BlockBodies are kept forever; all non-finalized signatures are always retained.
  • Consequence: get_signed_block returns None for a pruned finalized block (deep historical signed-block serving via BlocksByRoot is lost; peers use checkpoint sync).

Tests

  • New: StateDiff build/SSZ round-trip; state reconstruction (single + multi-diff after eviction); anchor recording; snapshot eviction (window/protected/anchors); signature pruning (healthy window / deep non-finality / early chain).
  • Storage + blockchain suites green; clippy -D warnings clean.

Status / follow-ups

  • 🔲 Draft: soak-testing on a local 4-node devnet (finalizing healthily).
  • prune_old_data runs on the node's finalization path (blockchain/src/lib.rs), so snapshot eviction + reconstruction are exercised after ~300 slots.
  • BlockBodies now grows unbounded; pruning bodies on a longer window is a possible follow-up.
  • Migration: existing DBs keep their full states as snapshots; diffs start accruing going forward (no backfill).

https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN

@MegaRedHand MegaRedHand force-pushed the feat/state-diff-layers branch from d9436b6 to 7e76c36 Compare June 18, 2026 19:31
Store every non-genesis state as a parent-linked diff (StateDiffs, never
pruned) plus full-state snapshots (States) at anchors and hot states, so
the full state history stays reconstructable cheaply and aggressive state
pruning is no longer needed.

- StateDiff stores slot, checkpoints, and the justification fields in full,
  plus the appended historical_block_hashes tail; config/validators come
  from the nearest snapshot and latest_block_header from BlockHeaders.
- get_state reconstructs by walking parent diffs to the nearest snapshot;
  1024-slot anchors (StateAnchors) bound the walk.
- Snapshot eviction (prune_old_states) keeps the last 300 slots + anchors
  + finalized/justified/head; evicted snapshots leave their diff behind.
- Block signatures are pruned only for old finalized blocks, keeping a
  recent SIGNATURE_PRUNING_RANGE window and all non-finalized signatures;
  block headers and bodies are kept forever.

Claude-Session: https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN
@MegaRedHand MegaRedHand force-pushed the feat/state-diff-layers branch from 7e76c36 to b97129f Compare June 18, 2026 19:48
- Bundle the parent base of a diff into a `DiffBase { root, hbh_len, slot }`
  struct, shrinking `insert_state_with_diff` from five positional args to three.
- Register the anchor bootstrap snapshot in `StateAnchors` from `init_store`, so
  genesis / checkpoint-sync (the base of every diff chain) is never evicted.
  Previously it could be pruned once finality advanced past the hot window,
  making the first 1024-slot window unreconstructable.
- Enforce the invariant that a `States` snapshot is never written alone: it is
  always paired with a `StateDiffs` (parented states) or `StateAnchors`
  (bootstrap) entry. Plain `insert_state` now writes `States` alone only in
  tests, so it is gated `#[cfg(test)]`.

Claude-Session: https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN
Replace the verbose field-by-field DiffBase literal at the block-import call
site with `DiffBase::from_state(parent_root, &parent_state)`: the caller passes
the already-known parent root and the constructor reads hbh_len/slot from the
state.

Claude-Session: https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN
StateDiff and DiffBase are storage-persistence details, not shared consensus
types, so move them from `ethlambda-types` into a new `state_diff` module in the
storage crate (adds the `libssz-types` dep). `DiffBase` fields are now
crate-internal (`pub(crate)`); external callers construct via
`DiffBase::from_state`.

Claude-Session: https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN
… to state_diff

- Drop the `#[cfg(test)]` `Store::insert_state` method (no production caller);
  tests seed a base snapshot via a plain `insert_snapshot` test helper.
- Move the snapshot + diffs -> State assembly out of `Store::reconstruct_state`
  into `state_diff::reconstruct`; the store keeps only the diff-walk and header
  fetch.
- Inline the one-line `get_state_snapshot` / `get_state_diff` wrappers into their
  call sites via `get_ssz`.

Claude-Session: https://claude.ai/code/session_01RnSujepExeyvKWRsSdZxFN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant