Skip to content

fix(dpp): return error instead of panicking on storage-fee refund div-by-zero#3799

Open
QuantumExplorer wants to merge 1 commit into
v3.1-devfrom
claude/fix-storage-fee-divide-by-zero
Open

fix(dpp): return error instead of panicking on storage-fee refund div-by-zero#3799
QuantumExplorer wants to merge 1 commit into
v3.1-devfrom
claude/fix-storage-fee-divide-by-zero

Conversation

@QuantumExplorer
Copy link
Copy Markdown
Member

@QuantumExplorer QuantumExplorer commented Jun 4, 2026

Important

⚠️ Partial fix — relocates the halt, does not restore liveness (found in adversarial review)

This PR removes the panic (good — a panic risks unwinding-across-FFI/abort and is caught only by the node's panic hook), but it does not restore liveness. Returning Err from original_removed_credits_multiplier_from propagates up the consensus path: restore_original_removed_credits_amountrefund_storage_fee_to_epochs_mapadd_distribute_storage_fee_to_epochs_operationsprocess_block_fees_and_validate_sum_treesrun_block_proposal_v0 (the ? returns the outer Err; it is never folded into ValidationResult.errors) → process_proposal (the ? short-circuits before the graceful if !run_result.is_valid() reject path) → ResponseException. Per the ABCI spec, CometBFT/Tenderdash panics on a ResponseException. So every validator still halts deterministically at the same block — the halt just moves from drive-abci to Tenderdash.

Proper fix (deferred — needs a fee-model semantics decision): for a fully-amortized refund (current_era >= PERPETUAL_STORAGE_ERAS, i.e. the storage cost has been entirely distributed over the full 50-era window), return a multiplier of 1 (refund the amount unchanged) so the block processes and the chain continues, instead of erroring. The near-boundary multiplier explosion (current_era == 49 → ~32000×) should likely be clamped too.

Also missed: the same three functions contain five more unguarded divide-by-zeros if epochs_per_era == 0 (paid_epochs / epochs_per_era, paid_epochs % epochs_per_era, and three Decimal divisions in original_removed_credits_multiplier_from / refund_storage_fee_to_epochs_map / distribution_storage_fee_to_epochs_map). epochs_per_era is a plain u16 deserialized from config with no zero-check (no NonZeroU16), so a misconfigured node panics. These should be guarded (or epochs_per_era made a validated NonZeroU16).

The checked_sub/is_zero() changes here are still worth keeping (they remove the unguarded panic and the underflow), but this PR should be treated as a mitigation, not the liveness fix. The is_zero() check is correct (the empty table sum is exactly Decimal::ZERO, not a denormal).


Issue being fixed or feature implemented

Latent chain-halt panic in storage-fee refund accounting.

original_removed_credits_multiplier_from (fee/epoch/distribution.rs) computed dec!(1) / ratio_used. FEE_DISTRIBUTION_TABLE has exactly PERPETUAL_STORAGE_ERAS (50) entries, so once a refund's original storage epoch is at least that whole window behind the repayment epoch (current_era >= 50), every table era compares Ordering::Less, the iterator yields nothing, and ratio_used sums to zero.

rust_decimal::Decimal's / operator panics on a zero divisor — unlike integer/f64 division, and unlike its own checked_div. This call runs on the consensus path (process_block_fees_and_validate_sum_trees at epoch change, via restore_original_removed_credits_amountrefund_storage_fee_to_epochs_map). A panic there aborts every node simultaneously → chain halt. Not reachable today (it needs the chain to have run ~50 eras with a document surviving the full perpetual-storage window), but a guaranteed liveness failure as the chain ages, with no graceful error path.

What was done?

original_removed_credits_multiplier_from now returns Result<Decimal, ProtocolError>:

  • Returns ProtocolError::DivideByZero when ratio_used == 0 (refund older than the entire perpetual-storage window) instead of panicking.
  • Guards the paid_epochs = start_repayment - start subtraction with checked_subProtocolError::Overflow, so a repayment epoch before the original storage epoch can't underflow (panic in debug / wrap in release) either.

The sole production caller restore_original_removed_credits_amount already returns a Result and now propagates with ?. Test call sites updated to .expect(...).

How Has This Been Tested?

cargo test -p dpp --lib distribution455 passed, 0 failed, including two new tests:

  • should_return_error_instead_of_panicking_when_window_fully_elapsed: asserts original_removed_credits_multiplier_from(0, 1000, 20) (current_era = 50) returns Err(DivideByZero) and that restore_original_removed_credits_amount propagates it rather than panicking.
  • should_return_error_when_repayment_epoch_precedes_start: asserts the underflow guard returns Err(Overflow).

cargo fmt --all clean.

Breaking Changes

None. Internal fee-distribution helper signature only; behavior is unchanged for all in-window refunds (the only difference is a clean error instead of a panic in the out-of-window edge case).

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened validation in credit calculation logic to properly detect and handle edge cases that could cause calculation failures
    • Enhanced error handling in credit restoration operations to prevent unexpected failures from invalid parameter combinations
    • Added safeguards against mathematical edge cases that could produce invalid results or unexpected behavior

…-by-zero

`original_removed_credits_multiplier_from` computed `dec!(1) / ratio_used`.
`FEE_DISTRIBUTION_TABLE` has exactly `PERPETUAL_STORAGE_ERAS` (50) entries,
so once a refund's original storage epoch is at least that whole window
behind the repayment epoch (`current_era >= 50`), every table era is
`Less`, the iterator yields nothing, and `ratio_used` sums to zero.
`rust_decimal::Decimal`'s `/` operator PANICS on a zero divisor (unlike
integer/`f64` division and unlike its own `checked_div`). On the
consensus path (`process_block_fees_and_validate_sum_trees` at epoch
change) this would abort every node simultaneously and halt the chain.
Not reachable today, but a guaranteed liveness failure as the chain ages.

The function now returns `Result<Decimal, ProtocolError>`: it returns
`ProtocolError::DivideByZero` when the window is fully elapsed and
guards the `paid_epochs` subtraction with `checked_sub`
(`ProtocolError::Overflow`) so a repayment epoch before the original
storage epoch can't underflow either. The sole production caller
(`restore_original_removed_credits_amount`) already returns a Result and
propagates with `?`. Test call sites updated to `.expect(...)`; adds
tests for both error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The fee epoch distribution module refactors original_removed_credits_multiplier_from to return a Result type, adding explicit overflow checking via checked_sub for epoch computation and zero-division guards. The caller and all unit tests are updated to handle the new error types.

Changes

Credit Multiplier Error Handling

Layer / File(s) Summary
Core error handling in original_removed_credits_multiplier_from
packages/rs-dpp/src/fee/epoch/distribution.rs
The function now returns Result<Decimal, ProtocolError>. Epoch subtraction uses checked_sub and returns ProtocolError::Overflow on underflow; the ratio computation guards against zero and returns ProtocolError::DivideByZero instead of panicking.
Error propagation in restore_original_removed_credits_amount
packages/rs-dpp/src/fee/epoch/distribution.rs
The caller now propagates errors from original_removed_credits_multiplier_from using the ? operator rather than assuming success.
Test suite updates
packages/rs-dpp/src/fee/epoch/distribution.rs
All existing tests calling original_removed_credits_multiplier_from are updated to handle the Result return via .expect(...); new tests validate DivideByZero on fully-elapsed perpetual windows and Overflow when repayment epoch precedes start epoch.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ready for final review

Suggested reviewers

  • shumkov

Poem

🐰 Checked math hops through epochs true,
No overflow or panic too,
Errors caught before they grow,
Safe division, tested so.
Fee computations shine anew!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: converting a panic on division-by-zero in storage-fee refund handling into a proper error return.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-storage-fee-divide-by-zero

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thepastaclaw
Copy link
Copy Markdown
Collaborator

thepastaclaw commented Jun 4, 2026

✅ Review complete (commit 34fe261)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/rs-dpp/src/fee/epoch/distribution.rs (1)

225-229: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix unchecked u16 subtraction overflow in refund_storage_fee_to_epochs_map

EpochIndex is u16, and refund_storage_fee_to_epochs_map computes start_era via skip_until_epoch_index - start_epoch_index before reaching restore_original_removed_credits_amount(...), so when current_epoch_index + 1 < start_epoch_index this can overflow (panic in debug / wrap in release) instead of returning a clean error. Add a checked subtraction for start_era and add a caller-level regression test through subtract_refunds_from_epoch_credits_collection for the invalid epoch ordering case.

🛠️ Suggested fix
-    let start_era: u16 = (skip_until_epoch_index - start_epoch_index) / epochs_per_era;
+    let paid_epochs = skip_until_epoch_index
+        .checked_sub(start_epoch_index)
+        .ok_or(ProtocolError::Overflow(
+            "start repayment epoch is before the original storage epoch",
+        ))?;
+    let start_era: u16 = paid_epochs / epochs_per_era;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/rs-dpp/src/fee/epoch/distribution.rs` around lines 225 - 229,
refund_storage_fee_to_epochs_map performs subtraction on EpochIndex (u16) that
can underflow when skip_until_epoch_index < start_epoch_index; change the
subtraction that computes start_era to use a checked subtraction (e.g.,
checked_sub) and return the existing error type (or convert to the appropriate
Err) when it returns None, so the function returns a clean error instead of
panicking/wrapping; update the call sites if needed (e.g., where
restore_original_removed_credits_amount /
original_removed_credits_multiplier_from are used) to propagate the error, and
add a caller-level regression test named
subtract_refunds_from_epoch_credits_collection that supplies an invalid epoch
ordering (current_epoch_index + 1 < start_epoch_index) and asserts the function
returns the expected error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/rs-dpp/src/fee/epoch/distribution.rs`:
- Around line 225-229: refund_storage_fee_to_epochs_map performs subtraction on
EpochIndex (u16) that can underflow when skip_until_epoch_index <
start_epoch_index; change the subtraction that computes start_era to use a
checked subtraction (e.g., checked_sub) and return the existing error type (or
convert to the appropriate Err) when it returns None, so the function returns a
clean error instead of panicking/wrapping; update the call sites if needed
(e.g., where restore_original_removed_credits_amount /
original_removed_credits_multiplier_from are used) to propagate the error, and
add a caller-level regression test named
subtract_refunds_from_epoch_credits_collection that supplies an invalid epoch
ordering (current_epoch_index + 1 < start_epoch_index) and asserts the function
returns the expected error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86cb3154-7c23-4b23-bb52-928c2ed42458

📥 Commits

Reviewing files that changed from the base of the PR and between 25e6c1b and 34fe261.

📒 Files selected for processing (1)
  • packages/rs-dpp/src/fee/epoch/distribution.rs

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 94.91525% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.19%. Comparing base (9eca622) to head (34fe261).
⚠️ Report is 5 commits behind head on v3.1-dev.

Files with missing lines Patch % Lines
packages/rs-dpp/src/fee/epoch/distribution.rs 94.91% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           v3.1-dev    #3799    +/-   ##
==========================================
  Coverage     87.18%   87.19%            
==========================================
  Files          2624     2624            
  Lines        321014   321228   +214     
==========================================
+ Hits         279892   280085   +193     
- Misses        41122    41143    +21     
Components Coverage Δ
dpp 87.74% <94.91%> (+<0.01%) ⬆️
drive 86.05% <ø> (+<0.01%) ⬆️
drive-abci 89.54% <ø> (+<0.01%) ⬆️
sdk ∅ <ø> (∅)
dapi-client ∅ <ø> (∅)
platform-version ∅ <ø> (∅)
platform-value 92.17% <ø> (ø)
platform-wallet ∅ <ø> (∅)
drive-proof-verifier 47.85% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Targeted, well-scoped fix that converts a latent Decimal divide-by-zero panic in original_removed_credits_multiplier_from into a propagable ProtocolError::DivideByZero. The core fix is sound and consensus-safe. One suggestion: the companion defensive checked_sub for paid_epochs is bypassed by an unchecked subtraction one frame up in refund_storage_fee_to_epochs_map (line 311), so the underflow guard is unreachable on the production path. The deeper semantic question of whether aged-out refunds should be no-ops rather than errors is acknowledged as out-of-scope by the PR description.

🟡 1 suggestion(s)

1 additional finding(s) omitted (not in diff).

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-dpp/src/fee/epoch/distribution.rs`:
- [SUGGESTION] packages/rs-dpp/src/fee/epoch/distribution.rs:311: New `checked_sub` underflow guard is bypassed by an earlier unchecked subtraction on the same call path
  The PR adds a `checked_sub` in `original_removed_credits_multiplier_from` (lines 174-178) explicitly to ensure corrupted/unexpected inputs return `ProtocolError::Overflow` rather than panicking. The sole production caller, `refund_storage_fee_to_epochs_map`, performs the same subtraction one frame up at line 311: `let start_era: u16 = (skip_until_epoch_index - start_epoch_index) / epochs_per_era;`. This runs before `restore_original_removed_credits_amount` (line 317) and therefore before the new guard ever sees the inputs. If `skip_until_epoch_index < start_epoch_index` ever occurred, line 311 would panic (debug) or wrap (release) first, so the guard is unreachable for the underflow case it's meant to defend. Either guard line 311 with `checked_sub` for a consistent defensive posture, or drop the underflow guard (and its test) since the invariant is structurally enforced. The divide-by-zero portion of the fix is unaffected by this and remains correct.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

✅ DashSDKFFI.xcframework built for this PR.

SwiftPM (host the zip at a stable URL, then use):

.binaryTarget(
  name: "DashSDKFFI",
  url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
  checksum: "6986038bb0917700a4002b1c55a04e4d9ab8834e1fe2a17f5daf69153e2a452d"
)

Xcode manual integration:

  • Download 'DashSDKFFI.xcframework' artifact from the run link above.
  • Drag it into your app target (Frameworks, Libraries & Embedded Content) and set Embed & Sign.
  • If using the Swift wrapper package, point its binaryTarget to the xcframework location or add the package and place the xcframework at the expected path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants