Bound gossip_signatures with FIFO eviction to prevent unbounded growth#290
Bound gossip_signatures with FIFO eviction to prevent unbounded growth#290MegaRedHand merged 3 commits intomainfrom
Conversation
…y growth Replace the bare HashMap with a GossipSignatureBuffer that caps total individual signatures at 2048 (~6MB). When over capacity, the oldest data_root and all its signatures are evicted FIFO, matching the PayloadBuffer pattern. Finalization-based pruning is preserved as a complementary mechanism. Fixes #263
🤖 Kimi Code ReviewThe PR introduces a well-designed Minor Suggestions:
Verification of correctness:
Consensus safety note: LGTM – No changes required. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewFindings
Aside from that, the buffer bookkeeping itself looks consistent: I could not run the targeted tests because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have everything needed. Let me write a thorough review. Code Review: PR #290 — Bound gossip_signatures with FIFO evictionOverviewThis PR fixes a real production crash (issue #263) by replacing an unbounded The root problem is legitimate: during finalization stalls, gossip signatures accumulate indefinitely. The fix is well-targeted and the encapsulation is clean. CorrectnessSelf-eviction edge case (minor but worth documenting): In With
Invariant: if let Some(evicted) = self.order.pop_front() {
if let Some(removed) = self.data.remove(&evicted) {
self.total_signatures -= removed.signatures.len();
} else {
debug_assert!(false, "order/data out of sync: {evicted:?} not in data");
break;
}
}
self.data.retain(|root, entry| {
// ...
self.total_signatures -= entry.signatures.len(); // borrows self.total_signatures
// ...
});This relies on Rust 2021 disjoint closure captures allowing separate borrows of Performance
Unconditional Both
DesignFixed constant vs. validator-derived cap: FIFO eviction unit is entire data_roots, not individual signatures: When a data_root with 512 sigs is evicted to make room for one new sig, TestsThe six new tests cover the core paths well. A few gaps worth noting:
Code QualityThe encapsulation is clean and the pattern is consistent with Summary
The implementation correctly solves the stated production issue. The concerns above are all minor or future-facing — none block merging. The Codex review's suggestion to derive the cap from validator count is the most worth tracking as a follow-up. Automated review by Claude (Anthropic) · sonnet · custom prompt |
Greptile SummaryThis PR introduces Confidence Score: 5/5Safe to merge — bounded buffer is correctly implemented, invariants are maintained, all 22 tests pass, and the public API is unchanged. No P0 or P1 issues found. The FIFO eviction algorithm is correct: No files require special attention.
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Core change: replaces HashMap<H256, GossipDataEntry> with a bounded GossipSignatureBuffer (FIFO eviction, VecDeque-tracked order, O(1) total_signatures counter). Invariants between data, order, and total_signatures are correctly maintained across insert, delete, and prune. Six new unit tests cover all stated cases. No API-breaking changes. |
| crates/storage/Cargo.toml | Adds leansig and rand to [dev-dependencies] only, required to generate real XMSS signatures in the new unit tests. No production dependency changes. |
| Cargo.lock | Lock file updated to reflect new dev-dependencies (leansig, rand 0.10.0) for ethlambda-storage. No new production dependencies. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[insert hashed, validator_id, signature] --> B{data_root in data?}
B -- Yes --> C{validator_id in signatures?}
C -- Yes --> D[return / first-write-wins]
C -- No --> E[insert signature into BTreeMap\ntotal_signatures += 1]
B -- No --> F[create new GossipDataEntry\ninsert into data\npush data_root to order back\ntotal_signatures += 1]
E --> G{total_signatures > capacity?}
F --> G
G -- No --> H[done]
G -- Yes --> I[pop oldest data_root from order front]
I --> J[remove entry from data\ntotal_signatures -= entry.len]
J --> G
subgraph delete [delete keys]
K[for each vid,data_root] --> L{entry exists?}
L -- No --> M[skip]
L -- Yes --> N[remove vid from signatures\ntotal_signatures -= 1]
N --> O{signatures empty?}
O -- Yes --> P[remove from data\nadd to emptied_roots]
O -- No --> K
P --> K
K --> Q[VecDeque.retain: remove emptied_roots]
end
subgraph prune [prune finalized_slot]
R[data.retain: keep slot > finalized_slot\naccumulate pruned_roots\ntotal_signatures -= pruned.len] --> S[VecDeque.retain: remove pruned_roots]
end
Reviews (1): Last reviewed commit: "Bound gossip_signatures with FIFO evicti..." | Re-trigger Greptile
Overwrites the existing signature when the same (validator_id, data_root) is inserted again, keeping the latest signature. The total_signatures counter is only incremented for genuinely new entries.
Motivation
Fixes #263. During a finalization stall,
gossip_signaturesgrows without limit because it is only pruned when finalization advances. In the devnet4 test run (2026-04-07/08, ~18.5h), finalization stalled at slot 10733 for 6+ hours. During this period, every gossip attestation from every validator was inserted and never removed, withattestation_countin produced blocks growing from ~5 to 797. This contributed to the RocksDB file descriptor exhaustion crash on node 0.Description
Replaces the unbounded
HashMap<H256, GossipDataEntry>with a newGossipSignatureBufferstruct that enforces a hard cap on total individual signatures, following the same FIFO eviction pattern already used byPayloadBufferfor aggregated payloads.New
GossipSignatureBufferstructdataHashMap<H256, GossipDataEntry>orderVecDeque<H256>capacityusizetotal_signaturestotal_signaturesusizeEviction algorithm
Capacity
GOSSIP_SIGNATURE_CAP = 2048individual signatures:AGGREGATED_PAYLOAD_CAP = 512What stays the same
delete_gossip_signatures()still removes consumed sigs after aggregationStoremethods keep their exact signatures and return typesblockchain/src/store.rscompiles unchangedPerformance note
The
delete()method collects emptied data_roots during iteration and batch-cleans the VecDeque with a singleretain()pass, avoiding per-root O(n) scans.Tests added
6 new unit tests covering:
How to Test
Devnet validation: run a multi-client devnet, observe
lean_gossip_signaturesmetric stays bounded even if finalization stalls.Closes
Closes #263