fix(drive): unify shielded pool genesis/upgrade construction to prevent state divergence#3801
Conversation
…nt state divergence CONSENSUS-CRITICAL. The v12 shielded pool [ShieldedBalances] subtree was built two different ways that were NOT byte-identical: - Genesis path (create_initial_state_structure_v3 → initial_state_structure_shielded_pool_operations) built the pool and its eight children via a sorted GroveDbOpBatch. GroveDB roots a sorted batch at its median key, so the pool's parent Merk was rooted at [160]. - Upgrade path (Platform::transition_to_version_12) built the same elements with sequential breadth-first grove_insert_if_not_exists calls, rooting the parent Merk at [128] — the intended NOTES-at-root layout documented in drive::drive::shielded::paths. A state-synced fresh-genesis-v12 node (160) and an in-place-upgraded v12 node (128) therefore produced different [ShieldedBalances] subtree root hashes and would have disagreed on the app hash at the v11→v12 boundary block. v12 is not yet activated, so no live network has forked; the live mainnet/testnet upgrade converges on the correct sequential [128] layout — but the invariant that genesis and upgrade build identical state was broken. Fix: introduce one shared sequential builder, Drive::insert_shielded_pool_structure, and call it from BOTH paths: - transition_to_version_12 now delegates the pool + eight children to it (behavior unchanged: still [128]). - create_initial_state_structure_v3 removes the shielded ops from the batch and calls the shared helper AFTER the batch apply (so the top-level [ShieldedBalances] SumTree exists first). Genesis now builds the pool sequentially → [128], matching the upgrade by construction. v11 is intentionally NOT touched. The same batch-vs-sequential pattern exists at the v10→v11 boundary, but v11 is already activated on mainnet/testnet, so changing its construction would itself fork the live network. Tests (RED→GREEN, driving the REAL production functions): - test_genesis_v12_and_upgrade_to_v12_build_identical_shielded_pool: genesis-v12 vs genesis-v11+transition_to_version_12, asserts the entire [ShieldedBalances] subtree is byte-identical at every depth. Would have caught this in CI: ✖ before fix (genesis pool root [160] ≠ upgrade [128]), ✔ after (both [128]). - test_genesis_v11_and_upgrade_to_v11_build_identical_address_trees: read-only tripwire confirming the v11 SavedBlockTransactions/AddressBalances subtrees already coincide across batch (genesis) and sequential (upgrade) paths. GREEN. The tests compare the named subtree (not the whole-DB root hash) on purpose: the whole DB also carries the genesis epoch's recorded protocol-version field, which legitimately differs between a chain born at vN and one born at v(N-1) then upgraded, and is unrelated to subtree construction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Review complete (commit acb41a6) |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCentralizes ShieldedBalances subtree construction into Drive::insert_shielded_pool_structure, updates genesis (v3) and upgrade (v12) flows to call it, and adds recursive subtree-diff tests that assert byte-identical GroveDB subtrees across genesis and upgrade paths. ChangesShared Shielded Pool Initialization for Genesis and Upgrade
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs`:
- Around line 703-718: The current read_level() masks grove_get_raw_path_query()
failures by using unwrap_or_default(), causing read errors to be treated as
empty maps and letting collect_subtree_diffs() report false "no diffs"; change
read_level() to propagate the error instead of defaulting: remove
unwrap_or_default() and return a Result (or an Option that signals failure) from
read_level(), propagate that Result up to callers (including the code that calls
collect_subtree_diffs()), and ensure callers handle the Err case by failing the
equivalence guard or recording the read error; update any signature references
to read_level(), and ensure grove_get_raw_path_query() errors (from
QueryResultType usage) are surfaced rather than converted into {}.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 974fc056-37bd-409d-8a75-5a8039e8c0af
📒 Files selected for processing (4)
packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rspackages/rs-drive/src/drive/initialization/v3/mod.rspackages/rs-drive/src/drive/shielded/insert_shielded_pool_structure.rspackages/rs-drive/src/drive/shielded/mod.rs
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
PR is a well-scoped consensus-divergence fix: a single shared sequential builder is now used for both the v12 genesis path and the v11→v12 upgrade path, with byte-equivalence tests pinning the invariant. The only verified in-scope issue is a test-quality concern — the equivalence helper read_level swallows GroveDB query errors via unwrap_or_default(), which could mask future regressions as false greens. No blocking issues; production code paths look correct.
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs`:
- [SUGGESTION] packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs:703-718: `read_level` swallows GroveDB query errors, weakening the equivalence guard
`read_level` folds any failure from `grove_get_raw_path_query` into an empty `BTreeMap` via `.unwrap_or_default()`. This is the sole observation primitive driving `collect_subtree_diffs`, which backs `test_genesis_v12_and_upgrade_to_v12_build_identical_shielded_pool` and the v11 tripwire — tests whose entire purpose is to catch consensus-forking state divergence. If a query errors symmetrically on both sides (path-shape regression, stub change, future version-routing bug), both maps become `{}`, no diffs are recorded, and the tests pass green while observing nothing. Distinguish a legitimately empty subtree from a real `Err(_)` by panicking (or recording an explicit diff) on the error case.
| platform | ||
| .drive | ||
| .grove_get_raw_path_query( | ||
| &pq, | ||
| txn, | ||
| QueryResultType::QueryKeyElementPairResultType, | ||
| &mut vec![], | ||
| &platform | ||
| .state | ||
| .load() | ||
| .current_platform_version() | ||
| .expect("platform version") | ||
| .drive, | ||
| ) | ||
| .map(|(r, _)| r.to_key_elements().into_iter().collect()) | ||
| .unwrap_or_default() |
There was a problem hiding this comment.
🟡 Suggestion: read_level swallows GroveDB query errors, weakening the equivalence guard
read_level folds any failure from grove_get_raw_path_query into an empty BTreeMap via .unwrap_or_default(). This is the sole observation primitive driving collect_subtree_diffs, which backs test_genesis_v12_and_upgrade_to_v12_build_identical_shielded_pool and the v11 tripwire — tests whose entire purpose is to catch consensus-forking state divergence. If a query errors symmetrically on both sides (path-shape regression, stub change, future version-routing bug), both maps become {}, no diffs are recorded, and the tests pass green while observing nothing. Distinguish a legitimately empty subtree from a real Err(_) by panicking (or recording an explicit diff) on the error case.
| platform | |
| .drive | |
| .grove_get_raw_path_query( | |
| &pq, | |
| txn, | |
| QueryResultType::QueryKeyElementPairResultType, | |
| &mut vec![], | |
| &platform | |
| .state | |
| .load() | |
| .current_platform_version() | |
| .expect("platform version") | |
| .drive, | |
| ) | |
| .map(|(r, _)| r.to_key_elements().into_iter().collect()) | |
| .unwrap_or_default() | |
| ) | |
| .map(|(r, _)| r.to_key_elements().into_iter().collect()) | |
| .unwrap_or_else(|e| panic!("failed to read subtree at {:?}: {e}", path)) |
source: ['claude', 'coderabbit']
There was a problem hiding this comment.
Resolved in acb41a6 — read_level swallows GroveDB query errors, weakening the equivalence guard no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3801 +/- ##
============================================
- Coverage 87.19% 87.19% -0.01%
============================================
Files 2624 2625 +1
Lines 321186 321360 +174
============================================
+ Hits 280046 280196 +150
- Misses 41140 41164 +24
🚀 New features to boost your workflow:
|
|
✅ DashSDKFFI.xcframework built for this PR.
SwiftPM (host the zip at a stable URL, then use): .binaryTarget(
name: "DashSDKFFI",
url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
checksum: "b5b9cd06ab0f7c645fae9c2836639bfb87567f7a76fbf9049d5deefefca2d719"
)Xcode manual integration:
|
…read errors Address review feedback (@coderabbitai, @thepastaclaw): the test-only `read_level` helper folded any `grove_get_raw_path_query` failure into an empty `BTreeMap` via `unwrap_or_default()`. That could let `collect_subtree_diffs` report "no diffs" when one side's subtree was actually unreadable — a false GREEN. Now a query *error* panics the equivalence guard loudly, while a legitimately empty (Ok) subtree still yields an empty map. Both equivalence tests remain GREEN. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative review: the sole prior finding (read_level swallowing GroveDB errors) is FIXED at packages/rs-drive-abci/.../v0/mod.rs:703-728 — it now panics on Err while preserving the legitimate empty-Ok case. New latest-delta finding from codex (test-coverage): collect_subtree_diffs reads each platform with its own state.current_platform_version, but platform_b's state remains v11 after transition_to_version_12, so the two sides of the v12 equivalence guard are queried under different drive/grove versions (v12 introduces GROVE_V3). Verified against the source; valid in-scope test-quality suggestion. No blocking issues; production code from prior reviewed SHA is unchanged.
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs`:
- [SUGGESTION] packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs:695-729: Equivalence guard reads each platform under its own (stale) platform version
`read_level` derives the drive version from `platform.state.load().current_platform_version()` for each side. Platform A is built at v12 (`with_initial_protocol_version(12)`) so reads use v12's drive/grove version, but Platform B is built at v11 and `transition_to_version_12` writes to grovedb without bumping `state.current_platform_version`. As a result the upgraded tree is queried with the v11 drive/grove version even though v12 moves Drive to GROVE_V3. The guard is meant to assert that genesis-v12 and upgraded-v12 trees are byte-identical under the target activation version, so both reads should be performed under the same target `PlatformVersion`. Thread the target `PlatformVersion` (the `platform_version_12` already available in the test) into `collect_subtree_diffs`/`read_level` and use its `drive` for both sides instead of pulling it from each platform's in-memory state. The same concern applies to the v11 tripwire test, which mixes a v10-state platform with a v11-state platform.
| fn read_level( | ||
| platform: &crate::platform_types::platform::Platform<crate::rpc::core::MockCoreRPCLike>, | ||
| txn: drive::grovedb::TransactionArg, | ||
| path: &[Vec<u8>], | ||
| ) -> std::collections::BTreeMap<Vec<u8>, Element> { | ||
| let mut q = Query::new(); | ||
| q.insert_all(); | ||
| let pq = PathQuery::new(path.to_vec(), SizedQuery::new(q, None, None)); | ||
| // A *query error* (as opposed to a legitimately empty subtree, which | ||
| // returns Ok with no items) must fail the equivalence guard loudly — | ||
| // swallowing it into an empty map could let `collect_subtree_diffs` | ||
| // report "no diffs" when one side was actually unreadable, producing | ||
| // a false GREEN. An empty Ok result correctly yields an empty map. | ||
| let (results, _) = platform | ||
| .drive | ||
| .grove_get_raw_path_query( | ||
| &pq, | ||
| txn, | ||
| QueryResultType::QueryKeyElementPairResultType, | ||
| &mut vec![], | ||
| &platform | ||
| .state | ||
| .load() | ||
| .current_platform_version() | ||
| .expect("platform version") | ||
| .drive, | ||
| ) | ||
| .unwrap_or_else(|e| { | ||
| panic!( | ||
| "equivalence guard: subtree read at path {:?} must succeed, got: {e}", | ||
| path.iter().map(hex::encode).collect::<Vec<_>>() | ||
| ) | ||
| }); | ||
| results.to_key_elements().into_iter().collect() | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: Equivalence guard reads each platform under its own (stale) platform version
read_level derives the drive version from platform.state.load().current_platform_version() for each side. Platform A is built at v12 (with_initial_protocol_version(12)) so reads use v12's drive/grove version, but Platform B is built at v11 and transition_to_version_12 writes to grovedb without bumping state.current_platform_version. As a result the upgraded tree is queried with the v11 drive/grove version even though v12 moves Drive to GROVE_V3. The guard is meant to assert that genesis-v12 and upgraded-v12 trees are byte-identical under the target activation version, so both reads should be performed under the same target PlatformVersion. Thread the target PlatformVersion (the platform_version_12 already available in the test) into collect_subtree_diffs/read_level and use its drive for both sides instead of pulling it from each platform's in-memory state. The same concern applies to the v11 tripwire test, which mixes a v10-state platform with a v11-state platform.
source: ['codex']
QuantumExplorer
left a comment
There was a problem hiding this comment.
While I might not have done things this way, it's fine.
Issue being fixed or feature implemented
The v12 shielded pool
[ShieldedBalances]subtree was built two different ways that were NOT byte-identical:Drive::create_initial_state_structure_v3→initial_state_structure_shielded_pool_operations): the main pool[ShieldedBalances, "M"]and its eight children were added to aGroveDbOpBatchand applied viagrove_apply_batch. GroveDB roots a sorted batch at its median key, so the pool's parent Merk was rooted at key[160].Platform::transition_to_version_12): the same elements were inserted one-by-one with sequential breadth-firstgrove_insert_if_not_existscalls, rooting the parent Merk at key[128]— the intended NOTES-at-root layout documented indrive::drive::shielded::paths(SHIELDED_NOTES_KEY = 128sits at the root of the balanced pool tree).AVL rebalancing is insertion-order-sensitive. The result was two different
[ShieldedBalances]subtree root hashes for the same logical structure:[160]layout.[128]layout.These two node populations would compute different app hashes at the v11→v12 boundary block and fork.
Why it's consensus-relevant
genesis ≡ upgradeinvariant — a chain born at v12 must hold the same state as a chain that upgraded into v12 — was broken for the shielded pool.[128]layout — i.e. the live upgrade converges on the right shape. The bug is in the genesis path ([160]), which violates the documented design and the genesis≡upgrade invariant; it would surface for any node that state-syncs or builds a fresh v12 chain.The design intent (
packages/rs-drive/src/drive/shielded/paths.rs) is NOTES at the root ([128]). The sequential/upgrade path matches the design; the genesis/batch path ([160]) was the wrong one.What was done?
Introduced one shared sequential builder and routed both construction paths through it, so the pool is built identically by construction:
Drive::insert_shielded_pool_structure(packages/rs-drive/src/drive/shielded/insert_shielded_pool_structure.rs): inserts the main pool[ShieldedBalances, "M"]SumTree, then the eight children in the exact breadth-first order (NOTES128, NULLIFIERS64, ANCHORS_IN_POOL192, TOTAL_BALANCE32, ANCHORS_BY_HEIGHT96, RECENT_NULLIFIERS160, COMPACTED_NULLIFIERS224, EXPIRATION240), each via a sequentialgrove_insert_if_not_exists. The element types are copied verbatim from the authoritativetransition_to_version_12body.transition_to_version_12: keeps its top-level[ShieldedBalances]SumTree insert and now delegates the pool + eight children to the shared helper. Behavior is unchanged (it already produced[128]).create_initial_state_structure_v3: keeps creating the top-level[ShieldedBalances]SumTree as a standalone insert (already verified identical between paths), removes the shielded ops from theGroveDbOpBatch(deletedinitial_state_structure_shielded_pool_operations), and calls the shared helper aftergrove_apply_batch(so[ShieldedBalances]exists first). Genesis now builds the pool sequentially →[128], matching the upgrade.[128]anymore.v11 is intentionally NOT modified. The same batch-vs-sequential pattern exists at the v10→v11 boundary (
create_initial_state_structure_v2vstransition_to_version_11), but v11 is already activated on mainnet/testnet — changing how v11 builds trees would itself fork the live network. Instead, a read-only equivalence test confirms the two v11 paths already coincide.Files changed
packages/rs-drive/src/drive/shielded/insert_shielded_pool_structure.rs(new — shared builder)packages/rs-drive/src/drive/shielded/mod.rs(register module)packages/rs-drive/src/drive/initialization/v3/mod.rs(genesis: drop batch pool ops, call shared helper after batch apply)packages/rs-drive-abci/src/execution/platform_events/protocol_upgrade/perform_events_on_first_block_of_protocol_change/v0/mod.rs(transition delegates to shared helper; new equivalence tests + sharedcollect_subtree_diffshelper)How Has This Been Tested?
Two new tests drive the real production functions (not a hand-rebuilt batch):
test_genesis_v12_and_upgrade_to_v12_build_identical_shielded_pool(consensus guard)with_initial_protocol_version(12).set_genesis_state()→ realcreate_initial_state_structure_v3).transition_to_version_12.[ShieldedBalances]subtree is byte-identical at every depth (main pool element with its carriedroot_key, all eight children, and everything below), via a recursivecollect_subtree_diffshelper.GENESIS pool root [hex: a0] (160)≠UPGRADE [hex: 80] (128)→ FAILED (path=["34"] key=4d).[hex: 80] (128)→ ok.test_genesis_v11_and_upgrade_to_v11_build_identical_address_trees(read-only v11 tripwire)transition_to_version_11.SavedBlockTransactions(3 children — the batch-vs-sequential risk) andAddressBalancessubtrees are byte-identical. GREEN as-is — the two v11 paths coincidentally already match. The test is documented as a STOP signal: if it ever goes RED, do not edit v11 code (it's live), analyze the discrepancy.Full surrounding suites pass:
cargo test -p drive-abci --lib perform_events_on_first_block_of_protocol_change→ 13 passed.cargo test -p drive --lib initialization→ 8 passed.cargo clippy -p drive --libandcargo clippy -p drive-abci --lib --tests→ clean (no new warnings).No golden/pinned genesis-v12 or shielded root-hash constants exist anywhere in the tree, so nothing else needed updating.
Breaking Changes
No API breaking changes. This is a state-construction change to a not-yet-activated protocol version (v12 genesis). It makes a fresh-genesis-v12 chain's
[ShieldedBalances]subtree byte-identical to an upgraded one (the upgrade shape,[128], is unchanged). v11 construction is untouched, so already-activated networks are unaffected.Checklist:
For repository code-owners and collaborators only
🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Refactor