Skip to content

test: verify real signatures in fork-choice spec tests#304

Open
MegaRedHand wants to merge 1 commit intopr-fixture-schemafrom
pr-real-sig-verify
Open

test: verify real signatures in fork-choice spec tests#304
MegaRedHand wants to merge 1 commit intopr-fixture-schemafrom
pr-real-sig-verify

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand commented Apr 17, 2026

Summary

Swap the fork-choice harness's _without_verification shortcuts for real signature verification where it makes sense, and collapse the aggregated-attestation paths back into a single verifying entry point.

store.rs

  • Delete on_gossip_attestation_without_verification — the test harness now uses the real verifying on_gossip_attestation.
  • on_block / on_block_without_verification / on_block_core(verify) are unchanged. Fork-choice fixtures don't ship block signatures, so the test harness still calls on_block_without_verification.

forkchoice_spectests.rs

  • attestation steps now decode the fixture's real XMSS signature and call on_gossip_attestation(&signed, is_aggregator).
  • gossipAggregatedAttestation steps now carry the real ~180 KB aggregated proof bytes through to on_gossip_aggregated_attestation.
  • Factor out assert_step_outcome(step_idx, expected_valid, result) to replace three near-duplicate match blocks.

test-fixtures crate

  • Hoist deser_xmss_hex out of signature_types.rs into the shared ethlambda-test-fixtures crate so both test binaries can use it.
  • AttestationStepData::signature deserializes straight to Option<XmssSignature> via a new deser_opt_xmss_hex wrapper.

SKIP_TESTS rewrite

Replaces the previous placeholder skip list with 13 fixtures whose aggregated XMSS proofs are rejected with AggregateVerificationFailed(ProofError(InvalidProof)). Root cause: our verifier pins leanEthereum/leanMultisig@2eb4b9d while the fixtures are generated by anshalshukla/leanMultisig@devnet-4. The two forks disagree on:

  1. What goes into the Fiat-Shamir transcript as public_input (full vec vs a single 8-field digest).
  2. The rec_aggregation bytecode program itself (different hash).

Either mismatch alone invalidates every proof. Unblocking those 13 tests requires a coordinated devnet decision: repin to anshalshukla/leanMultisig@devnet-4 (matches zeam/ream/grandine/lantern), or upgrade the whole devnet to leanEthereum/leanMultisig main and regenerate fixtures.

Un-skipped now that we verify signatures

  • test_gossip_attestation_with_invalid_signature — real signature verification correctly rejects the deliberately-bad fixture signature.
  • test_equivocating_proposer_with_split_attestations — real verification accepts the valid attestation path.
  • The three fixtures test: adapt fork-choice harness to refreshed fixture format #303 temporarily skipped for the proof-participants routing gap.

Stack

Stacked on #303. Merge after that.

Test plan

  • cargo test --workspace --release passes (314 passed, 6 ignored)
  • signature_spectests still passes — hoisted deser_xmss_hex matches the previous local definition
  • CI is green

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Overall Assessment: Correct refactoring that removes test-only bypass helpers in favor of real verification paths. The consensus logic is preserved, but ensure the implicit bounds check via pubkey collection is intentional.


1. Security/Correctness: Validator Bounds Check Location

File: crates/blockchain/src/store.rs
Lines: 620-630 (new)

The bounds check for validator indices has moved from an explicit any(|vid| vid >= num_validators) loop into the pubkey collection iterator:

.map(|vid| {
    target_state
        .validators
        .get(vid as usize)
        .map(|v| v.pubkey.clone())
        .ok_or(StoreError::InvalidValidatorIndex)
})

This correctly rejects out-of-bounds indices with InvalidValidatorIndex, but ensure this behavior is preserved if the participant_indices() iterator is later modified


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context to write the full review.


PR #304test: verify real signatures in fork-choice spec tests

Overview

This PR removes two test-only bypass paths (on_gossip_attestation_without_verification, on_gossip_aggregated_attestation_core) and wires the fork-choice spec test harness to the same production entry points that live code uses. It also inlines on_gossip_aggregated_attestation_core back into its single caller and hoists deser_xmss_hex to the shared test-fixtures crate. The motivation is sound: test-only shortcuts were masking real verification paths and diverging semantically from the production pipeline.


Code Quality & Correctness

Good: double work eliminated

The old architecture had on_gossip_aggregated_attestation perform target-state lookup + bounds check, then call on_gossip_aggregated_attestation_core which did the same target-state lookup + bounds check a second time. Inlining removes this redundancy.

Good: validate_attestation_data moved earlier

It's now the first thing called in on_gossip_aggregated_attestation (store.rs:612), before the expensive get_state call. Clean fail-fast improvement.

Good: assert_step_outcome helper

Replaces three identical 12-line match blocks with a single clean generic helper. No behaviour change, just a readability win.

Good: deser_xmss_hex hoisting

Moving from signature_types.rs to ethlambda-test-fixtures and re-exporting via common.rs's blanket pub use ethlambda_test_fixtures::* is the right factoring.


Issues & Concerns

1. slot variable shadow with a type change (store.rs:637 and :655)

let slot: u32 = aggregated.data.slot.try_into().expect("slot exceeds u32");
// ... verification ...
let slot = aggregated.data.slot;   // shadows the u32 with the original u64

The shadow is intentional (the u32 is only needed for the verifier call; the u64 is used for logging), but it's subtle. Consider slot_u32 / slot_for_verification for the narrowed copy so the type difference is explicit to future readers.

2. SKIP_TESTS grows from 3 → 13

The root-cause explanation (two leanMultisig forks disagree on Fiat-Shamir public_input encoding and aggregation bytecode) is thorough and accurate. However, 13 skipped tests is a meaningful coverage gap. The failing tests include high-value scenarios like test_safe_target_advances_incrementally_along_the_chain, test_finalization_prunes_stale_aggregated_payloads, test_valid_gossip_aggregated_attestation. A tracking issue for the repin decision (repinning to anshalshukla/leanMultisig@devnet-4 to match zeam/ream/grandine/lantern) would help ensure this doesn't remain indefinitely.

3. HexByteList::from panics on malformed fixture data (types.rs:93)

impl From<HexByteList> for Vec<u8> {
    fn from(value: HexByteList) -> Self {
        ...
        hex::decode(stripped).expect("invalid hex in proof data")
    }
}

A panic here produces an opaque test failure pointing at an impl block rather than a fixture path. Since fixture deserialization already runs through serde, returning a serde error would give a better error message — or at minimum add the fixture content to the expect message. Minor for test code, but in a codebase with many fixtures it saves debugging time.

4. Behavioural change for attestation steps not yet visible in skipped tests

on_gossip_attestation_without_verification inserted directly into known_attestations and called update_head immediately. The new path flows through on_gossip_attestation → gossip-signature store (if is_aggregator) → promotion happens only at tick intervals 0/4. Tests that have an attestation step followed by a fork-choice check without an intervening tick would silently see a different head. Worth confirming none of the currently-passing fixtures rely on the immediate-head-update semantics of the old bypass.

5. ByteList::try_from expect message is slightly misleading (forkchoice_spectests.rs:152)

let proof_data = ByteList::try_from(proof_bytes)
    .expect("aggregated proof data fits in ByteListMiB");

The failure mode here isn't that the data "fits" — it's that try_from returns Err when the length exceeds the cap. "aggregated proof data exceeds ByteListMiB capacity" would be a more accurate panic message.


Minor Style Notes

  • The extra blank lines added around store.insert_new_aggregated_payload(...) and the info!() block are inconsistent with the surrounding function style; they can be dropped.
  • PR description says "Delete on_block_without_verification" but that function didn't exist by that name — on_block_core is kept (intentionally, since block fixtures don't ship signatures). The description is slightly misleading on this point.

Summary

The design direction is correct: real verification paths in tests catch real bugs, and the two removed bypass functions were genuine testing liabilities. The inlining of on_gossip_aggregated_attestation_core is clean, the assert_step_outcome refactor is a clear win, and the deser_xmss_hex hoisting is the right call.

The main items to address before merging: confirm the attestation-step behavioural change doesn't silently affect passing tests (Point 4), and track the leanMultisig repin as a concrete follow-up (Point 2). The slot shadow (Point 1) and HexByteList panic message (Points 3 and 5) are minor.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. crates/blockchain/tests/forkchoice_spectests.rs:97 ignores the new isAggregator fixture field and always calls store::on_tick(..., false). That breaks spec coverage for any interval-2 aggregation path, because crates/blockchain/src/store.rs:466 only aggregates when is_aggregator is true and does the work at crates/blockchain/src/store.rs:508. The newly skipped safe-target / pending-attestation tests are consistent with this harness bug, not just the leanMultisig mismatch.

  2. crates/blockchain/tests/forkchoice_spectests.rs:124 defaults attestation steps to is_aggregator = false. In production, non-aggregators validate and then drop gossip attestations; they only persist them when is_aggregator is true at crates/blockchain/src/store.rs:579. That means any existing fixture without isAggregator now turns a previously state-mutating attestation step into a no-op, which changes fork-choice semantics and explains why so many consensus tests had to be skipped.

  3. The new aggregated-proof parsing path can panic on invalid fixtures instead of reporting an expected step failure. crates/blockchain/tests/types.rs:91 uses expect("invalid hex in proof data"), and crates/blockchain/tests/forkchoice_spectests.rs:141 uses expect("aggregated proof data fits in ByteListMiB"). For valid: false vectors with malformed or oversized proof payloads, the harness will abort before assert_step_outcome runs.

I couldn’t run the test suite here: cargo needs writable ~/.cargo/~/.rustup and network access for the leanMultisig dependency, both blocked in this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Greptile Summary

This PR removes the _without_verification test shortcuts from the fork-choice harness, wiring attestation and gossipAggregatedAttestation steps to the real on_gossip_attestation / on_gossip_aggregated_attestation functions with live XMSS signature checks. On the production side, on_block_core loses its verify: bool flag and on_block becomes a thin verifying wrapper; deser_xmss_hex is hoisted into the shared ethlambda-test-fixtures crate. The 13 skipped fixtures are clearly documented with a root-cause explanation (leanMultisig fork mismatch) and a path to resolution.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style and robustness suggestions that do not affect correctness.

No P0 or P1 issues found. The production code changes are mechanical (flag removal, function deletion, inlining), and the test harness now exercises real cryptographic paths. All three concerns flagged are P2: panics instead of structured errors in test code, silently ignored is_aggregator on tick steps, and a double parent-state lookup in on_block. The SKIP_TESTS rationale is clear and does not mask regressions.

crates/blockchain/tests/forkchoice_spectests.rs — minor harness robustness issues (.expect() panics and is_aggregator ignored for tick steps).

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Removed verify: bool flag from on_block_core; on_block is now a thin verifying wrapper; on_block_without_verification and on_gossip_attestation_without_verification deleted; on_gossip_aggregated_attestation inlined. Minor: on_block fetches parent state twice (once for verification, once inside on_block_core for the transition).
crates/blockchain/tests/forkchoice_spectests.rs Harness now calls real on_gossip_attestation and on_gossip_aggregated_attestation with fixture signatures. Block steps keep using on_block_core (no block signatures in fixtures). Two style issues: .expect() panics instead of datatest_stable::Result errors for missing optional fields; step.is_aggregator silently ignored for tick steps.
crates/blockchain/tests/types.rs Adds deser_opt_xmss_hex wrapper for Option<XmssSignature> deserialization; AttestationStepData.signature now deserialises to the real XMSS bytes; imports deser_xmss_hex from the shared common module.
crates/blockchain/tests/signature_types.rs Drops the local deser_xmss_hex definition; now imports it from super::common (the shared test-fixtures crate). Clean de-duplication.
crates/common/test-fixtures/src/lib.rs Hoists deser_xmss_hex (previously in signature_types.rs) into this shared crate so both forkchoice_spectests and signature_spectests can reuse it; adds XmssSignature and SIGNATURE_SIZE imports.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming block via gossip] --> B[on_block]
    B --> C{Duplicate?}
    C -- yes --> D[Return Ok early]
    C -- no --> E[Fetch parent_state]
    E --> F[verify_signatures]
    F -- fail --> G[Return Err]
    F -- ok --> H[on_block_core]

    I[Test harness block step] --> H

    H --> J{Duplicate?}
    J -- yes --> D2[Return Ok early]
    J -- no --> K[Fetch parent_state again]
    K --> L[state_transition]
    L --> M[Store block + state]
    M --> N[update_head fork choice]

    P[Incoming attestation via gossip] --> Q[on_gossip_attestation]
    Q --> R[validate_attestation_data]
    R --> S[verify XMSS signature]
    S -- invalid --> T[Return Err]
    S -- valid --> U{is_aggregator?}
    U -- yes --> V[insert_gossip_signature]
    U -- no --> W[Drop - propagation only]

    X[Incoming aggregated attestation] --> Y[on_gossip_aggregated_attestation]
    Y --> Z[validate_attestation_data]
    Z --> AA[verify_aggregated_signature]
    AA -- fail --> AB[Return Err]
    AA -- ok --> AC[insert_new_aggregated_payload]

    style I fill:#f0ad4e,color:#000
    style H fill:#5bc0de,color:#000
    style B fill:#5cb85c,color:#000
Loading

Comments Outside Diff (2)

  1. crates/blockchain/tests/forkchoice_spectests.rs, line 109-110 (link)

    P2 is_aggregator from tick steps silently dropped

    store::on_tick is always called with false as the is_aggregator argument for tick steps. The step.is_aggregator field parsed from the fixture is used only for attestation steps. If a fixture ever specifies "isAggregator": true on a tick step to drive aggregation at interval 2, the field would be silently ignored and aggregate_committee_signatures would never run during that tick, causing the test to produce unexpected state without any warning.

    Consider using step.is_aggregator.unwrap_or(false) here to respect the fixture's intent:

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/blockchain/tests/forkchoice_spectests.rs
    Line: 109-110
    
    Comment:
    **`is_aggregator` from tick steps silently dropped**
    
    `store::on_tick` is always called with `false` as the `is_aggregator` argument for `tick` steps. The `step.is_aggregator` field parsed from the fixture is used only for `attestation` steps. If a fixture ever specifies `"isAggregator": true` on a `tick` step to drive aggregation at interval 2, the field would be silently ignored and `aggregate_committee_signatures` would never run during that tick, causing the test to produce unexpected state without any warning.
    
    Consider using `step.is_aggregator.unwrap_or(false)` here to respect the fixture's intent:
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/blockchain/src/store.rs, line 679-699 (link)

    P2 Double parent-state lookup between on_block and on_block_core

    on_block fetches parent_state from storage (line 688–695) to pass to verify_signatures, then on_block_core fetches it again (lines 721–728) for the state transition. Storage reads are cheap for the in-memory backend used in tests, but on RocksDB each call deserializes the full state (which can be large). Since on_block already owns the state after the lookup, passing it through to on_block_core (or caching it) would avoid the duplicate deserialisation in the production path.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/blockchain/src/store.rs
    Line: 679-699
    
    Comment:
    **Double parent-state lookup between `on_block` and `on_block_core`**
    
    `on_block` fetches `parent_state` from storage (line 688–695) to pass to `verify_signatures`, then `on_block_core` fetches it again (lines 721–728) for the state transition. Storage reads are cheap for the in-memory backend used in tests, but on RocksDB each call deserializes the full state (which can be large). Since `on_block` already owns the state after the lookup, passing it through to `on_block_core` (or caching it) would avoid the duplicate deserialisation in the production path.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/tests/forkchoice_spectests.rs
Line: 113-123

Comment:
**`.expect()` panics instead of structured test failures**

The `.expect("attestation step missing signature")` call on line 122 will panic rather than produce a clean `datatest_stable::Result` error if a fixture's `attestation` step is missing the `signature` field (which defaults to `None` via `#[serde(default)]`). A panic in a `datatest_stable` harness shows an unhelpful backtrace rather than a message like `"Step N: attestation step missing signature"`. Similarly, `validator_id.expect(...)` on line 118 has the same risk for fixtures with `valid: false` that omit the field.

Consider replacing these with `ok_or_else(|| "...".into())?` to keep failures within `datatest_stable::Result`:

```suggestion
                "attestation" => {
                    let att_data = step
                        .attestation
                        .expect("attestation step missing attestation data");
                    let signed_attestation = SignedAttestation {
                        validator_id: att_data
                            .validator_id
                            .ok_or_else(|| format!("Step {step_idx}: attestation step missing validator_id"))?,
                        data: att_data.data.into(),
                        signature: att_data
                            .signature
                            .ok_or_else(|| format!("Step {step_idx}: attestation step missing signature"))?,
                    };
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/tests/forkchoice_spectests.rs
Line: 109-110

Comment:
**`is_aggregator` from tick steps silently dropped**

`store::on_tick` is always called with `false` as the `is_aggregator` argument for `tick` steps. The `step.is_aggregator` field parsed from the fixture is used only for `attestation` steps. If a fixture ever specifies `"isAggregator": true` on a `tick` step to drive aggregation at interval 2, the field would be silently ignored and `aggregate_committee_signatures` would never run during that tick, causing the test to produce unexpected state without any warning.

Consider using `step.is_aggregator.unwrap_or(false)` here to respect the fixture's intent:

```suggestion
                    store::on_tick(&mut store, timestamp_ms, has_proposal, step.is_aggregator.unwrap_or(false));
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 679-699

Comment:
**Double parent-state lookup between `on_block` and `on_block_core`**

`on_block` fetches `parent_state` from storage (line 688–695) to pass to `verify_signatures`, then `on_block_core` fetches it again (lines 721–728) for the state transition. Storage reads are cheap for the in-memory backend used in tests, but on RocksDB each call deserializes the full state (which can be large). Since `on_block` already owns the state after the lookup, passing it through to `on_block_core` (or caching it) would avoid the duplicate deserialisation in the production path.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: exercise real signature verificati..." | Re-trigger Greptile

Comment thread crates/blockchain/tests/forkchoice_spectests.rs
Swap the fork-choice harness's `_without_verification` shortcuts for
real signature verification where it makes sense, and collapse the
aggregated-attestation paths back into a single verifying entry point.

Changes:
- `store.rs`: delete `on_gossip_attestation_without_verification`; the
  test harness now uses the real verifying `on_gossip_attestation`.
  `on_block` / `on_block_without_verification` / `on_block_core(verify)`
  stay as they are — fork-choice fixtures don't ship block signatures,
  so the test harness still calls `on_block_without_verification`.
- `test-fixtures` crate: hoist `deser_xmss_hex` out of
  `signature_types.rs` so both blockchain test binaries can share it.
- `forkchoice_spectests.rs`: `attestation` steps decode the fixture's
  real XMSS signature and call `on_gossip_attestation(&signed, is_aggregator)`.
  `gossipAggregatedAttestation` steps now carry the real ~180 KB
  aggregated proof bytes through to `on_gossip_aggregated_attestation`.
  Extract `assert_step_outcome` helper to replace three duplicated
  match blocks. `AttestationStepData::signature` deserializes straight
  to `Option<XmssSignature>` via a new `deser_opt_xmss_hex`.
- `SKIP_TESTS`: replace the previous placeholder skips with the 13
  fixtures whose aggregated XMSS proofs are rejected because our
  verifier pins `leanEthereum/leanMultisig@2eb4b9d` while the fixtures
  are generated by `anshalshukla/leanMultisig@devnet-4`. The two forks
  disagree on the Fiat-Shamir `public_input` encoding and the
  aggregation bytecode program — every proof rejects with
  `InvalidProof`. Resolving that fork mismatch is a separate,
  devnet-wide coordination issue.

Previously-skipped fixtures that now pass with real verification:
- `test_gossip_attestation_with_invalid_signature` (correctly rejected)
- `test_equivocating_proposer_with_split_attestations`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant