Skip malformed live chain keys#443
Conversation
Greptile SummaryThis PR makes
Confidence Score: 4/5The change is safe to merge — all four call sites handle the new Option return correctly, and a targeted test covers the core skip-on-malformed behavior. The logic is straightforward and well-tested. Two minor rough edges: malformed keys are silently garbage-collected in prune_live_chain (via unwrap_or(true)) without a comment explaining the intent, and no diagnostic is emitted anywhere when a bad key is encountered, which could obscure data-corruption bugs in production. crates/storage/src/store.rs — specifically the prune_live_chain take_while closure and the lack of any logging in decode_live_chain_key on the None path.
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Adds length guard to decode_live_chain_key (returns Option), updates all four call sites to skip/prune malformed keys, and adds a focused test. The unwrap_or(true) in prune_live_chain causes malformed keys to be eagerly deleted during any prune call; no diagnostic is emitted when bad keys are encountered. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Iterator yields key bytes] --> B{decode_live_chain_key}
B -->|len != 40| C[Return None]
B -->|len == 40| D[Return Some slot + root]
C --> E{Call site}
D --> E
E -->|get_live_chain| F[filter_map skips None]
E -->|max_live_chain_slot| G[filter_map skips None then max]
E -->|get_block_roots| H[filter_map skips None]
E -->|prune_live_chain take_while| I{Some or None?}
I -->|None unwrap_or true| J[Key added to delete list]
I -->|Some slot| K{slot < finalized_slot}
K -->|true| L[Key added to delete list]
K -->|false| M[Stop iteration]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[Iterator yields key bytes] --> B{decode_live_chain_key}
B -->|len != 40| C[Return None]
B -->|len == 40| D[Return Some slot + root]
C --> E{Call site}
D --> E
E -->|get_live_chain| F[filter_map skips None]
E -->|max_live_chain_slot| G[filter_map skips None then max]
E -->|get_block_roots| H[filter_map skips None]
E -->|prune_live_chain take_while| I{Some or None?}
I -->|None unwrap_or true| J[Key added to delete list]
I -->|Some slot| K{slot < finalized_slot}
K -->|true| L[Key added to delete list]
K -->|false| M[Stop iteration]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/storage/src/store.rs:871-875
**`unwrap_or(true)` silently deletes malformed keys during any prune**
When `decode_live_chain_key` returns `None`, `unwrap_or(true)` passes the predicate, so malformed keys are included in `keys_to_delete` and deleted — even if `finalized_slot` is 0 or the key sorts well after the finalization boundary. The PR description says read paths "skip" malformed entries, but in `prune_live_chain` they are always deleted. This is probably fine (garbage-collecting corrupt keys), but a comment clarifying the intent would help future readers understand why `true` (rather than `false`) is the fallback here.
### Issue 2 of 2
crates/storage/src/store.rs:472-479
**No diagnostic when malformed keys are encountered**
Malformed keys are silently skipped across all three read paths (`get_live_chain`, `max_live_chain_slot`, `get_block_roots`) and silently deleted in `prune_live_chain`. If corrupt data reaches the store due to a bug in a write path, there is no observable signal in logs or metrics. A `tracing::warn!` (or equivalent) inside `decode_live_chain_key` on the `None` branch would surface the anomaly without changing observable behavior for callers.
Reviews (1): Last reviewed commit: "Skip malformed live chain keys" | Re-trigger Greptile
| .take_while(|(k, _)| { | ||
| let (slot, _) = decode_live_chain_key(k); | ||
| slot < finalized_slot | ||
| decode_live_chain_key(k) | ||
| .map(|(slot, _)| slot < finalized_slot) | ||
| .unwrap_or(true) | ||
| }) |
There was a problem hiding this comment.
unwrap_or(true) silently deletes malformed keys during any prune
When decode_live_chain_key returns None, unwrap_or(true) passes the predicate, so malformed keys are included in keys_to_delete and deleted — even if finalized_slot is 0 or the key sorts well after the finalization boundary. The PR description says read paths "skip" malformed entries, but in prune_live_chain they are always deleted. This is probably fine (garbage-collecting corrupt keys), but a comment clarifying the intent would help future readers understand why true (rather than false) is the fallback here.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 871-875
Comment:
**`unwrap_or(true)` silently deletes malformed keys during any prune**
When `decode_live_chain_key` returns `None`, `unwrap_or(true)` passes the predicate, so malformed keys are included in `keys_to_delete` and deleted — even if `finalized_slot` is 0 or the key sorts well after the finalization boundary. The PR description says read paths "skip" malformed entries, but in `prune_live_chain` they are always deleted. This is probably fine (garbage-collecting corrupt keys), but a comment clarifying the intent would help future readers understand why `true` (rather than `false`) is the fallback here.
How can I resolve this? If you propose a fix, please make it concise.| fn decode_live_chain_key(bytes: &[u8]) -> Option<(u64, H256)> { | ||
| if bytes.len() != 40 { | ||
| return None; | ||
| } | ||
|
|
||
| let slot = u64::from_be_bytes(bytes[..8].try_into().expect("valid slot bytes")); | ||
| let root = H256::from_slice(&bytes[8..]); | ||
| (slot, root) | ||
| Some((slot, root)) |
There was a problem hiding this comment.
No diagnostic when malformed keys are encountered
Malformed keys are silently skipped across all three read paths (get_live_chain, max_live_chain_slot, get_block_roots) and silently deleted in prune_live_chain. If corrupt data reaches the store due to a bug in a write path, there is no observable signal in logs or metrics. A tracing::warn! (or equivalent) inside decode_live_chain_key on the None branch would surface the anomaly without changing observable behavior for callers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 472-479
Comment:
**No diagnostic when malformed keys are encountered**
Malformed keys are silently skipped across all three read paths (`get_live_chain`, `max_live_chain_slot`, `get_block_roots`) and silently deleted in `prune_live_chain`. If corrupt data reaches the store due to a bug in a write path, there is no observable signal in logs or metrics. A `tracing::warn!` (or equivalent) inside `decode_live_chain_key` on the `None` branch would surface the anomaly without changing observable behavior for callers.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
pablodeymo
left a comment
There was a problem hiding this comment.
I agree with greptile comment, the function should return a Result instead of an Option for better error handling.
LiveChain iterators currently assume every key has the slot/root layout and can panic if a malformed key is present.
This makes key decoding fail safely for invalid lengths and skips those entries in the read paths, while keeping valid entries usable.