Skip to content

perf(stark): fuse deep-composition reconstruction for both FRI points#775

Open
Oppen wants to merge 3 commits into
perf/rkyv-serializationfrom
perf/deep-reconstruction
Open

perf(stark): fuse deep-composition reconstruction for both FRI points#775
Oppen wants to merge 3 commits into
perf/rkyv-serializationfrom
perf/deep-reconstruction

Conversation

@Oppen

@Oppen Oppen commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Fuses the regular/symmetric-evaluation-point passes of
    reconstruct_deep_composition_poly_evaluation (STARK verifier, FRI step)
    into one traversal, sharing the OOD-table walk, the trace_term_coeffs
    walk, and both inplace_batch_inverse calls per query. The core trick: coeff*(base-ood)*denom
    distributes to denom*(coeff*base - coeff*ood), isolating the
    point-independent coeff*ood term (computed once, shared between both
    points) and letting base-field columns use the cheap IsSubFieldOf
    asymmetric multiply instead of a full extension-field product.
  • Follow-up: dedupes the z*g^k ladder walk within a query (was computed
    twice, once per evaluation point), hand-rolls the 2-element
    composition-tail batch inverse to avoid a Vec allocation, and marks the
    IsSubFieldOf<Degree{2,3}GoldilocksExtensionField> mul/add/sub impls
    #[inline(always)] to match their sibling IsField impls.
  • Trims a doc-comment sentence that duplicated an inline comment.

Measured impact

Multi-query recursion profile (make test-profile-recursion-multi; the
underlying proof has ~40-45K cycles of run-to-run variance from
stark::grinding::generate_nonce's parallel find_any search, so all
deltas below are 2-sample-cluster-to-cluster, not point-to-point):

  • Pre-fusion baseline: 2,210,366,539 cycles
  • After the fusion (85fa5b3e): ~1,947,242,710 cycles (-11.9%)
  • After the follow-up micro-opts (3ac5e494): ~1,939,857,910 cycles
    (a further -0.38%, -14.2% total from baseline)

Single-query and step-3 (FRI) improvements are proportionally similar; see
individual commit messages for per-commit numbers.

Soundness

Every guard the two-call version had (base/aux-length checks, the
trace_term_coeffs sanity check, batch-inverse zero-rejection, the
composition-poly-parts-count check) is reproduced exactly or made strictly
stricter in the fused version — verified by independent adversarial review
agents against the reject-path semantics, the algebraic identity, and the
soundness threat model (a dishonest prover cannot craft a proof this code
accepts that the prior code would have rejected).

Test plan

  • make test (workspace-wide; only failure is math-cuda, which needs
    a GPU unavailable in this environment — pre-existing, unrelated)
  • make test-ethrex
  • cargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture (in-VM end-to-end verify, output digest
    unchanged across all three commits)
  • Two independent adversarial review passes (soundness + correctness,
    then a second round covering performance/simplicity) on the diff

Oppen added 3 commits July 3, 2026 15:15
Reconstructing the deep composition polynomial per query walked the
OOD table, the trace-term coefficients, and inverted the trace/comp
denominators independently for the regular and symmetric evaluation
points, even though the OOD-derived terms don't depend on the point.
Rewriting coeff*(base-ood)*denom as denom*(coeff*base - coeff*ood)
isolates the point-independent coeff*ood term (computed once, shared
between both points) from coeff*base (cheap base*ext multiply for
base-field columns), and lets both denom sets batch-invert together.

Multi-query recursion profile: 2,210,366,539 -> 1,951,764,531 cycles
(-11.7%); step 3 (FRI) 912M -> 652M cycles (-28.5%).
The doc comment on reconstruct_deep_composition_poly_evaluation_pair
restated the inline comment three lines into the function body.
reconstruct_deep_composition_poly_evaluation_pair walked the z*g^k
ladder twice per query (once per evaluation point) though it doesn't
depend on which point is being evaluated; interleave the two
denominator pushes into one walk instead. The 2-element composition-
tail batch inverse also went through the general Vec-allocating
inplace_batch_inverse; hand-roll it (one inversion, three muls, no
allocation).

Also mark the IsSubFieldOf<Degree{2,3}GoldilocksExtensionField> mul/
add/sub impls for GoldilocksField #[inline(always)], matching the
sibling IsField impls on the same types — these are exactly the cheap
asymmetric ops the deep-composition reconstruction fusion relies on.

Multi-query recursion profile (two make-rebuilt runs per side, since
the underlying proof has run-to-run variance): baseline 1,947,221,680
/ 1,947,263,739 cycles; with this change 1,939,869,854 / 1,939,845,966
(-~7.37M, -0.38%). No correctness change (in-VM verify output digest
identical to the prior commit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant