perf(stark): fuse deep-composition reconstruction for both FRI points by Oppen · Pull Request #775 · yetanotherco/lambda_vm

Oppen · 2026-07-03T19:45:28Z

Summary

Fuses the regular/symmetric-evaluation-point passes of
reconstruct_deep_composition_poly_evaluation (STARK verifier, FRI step)
into one traversal, sharing the OOD-table walk, the trace_term_coeffs
walk, and both inplace_batch_inverse calls per query. The core trick: coeff*(base-ood)*denom
distributes to denom*(coeff*base - coeff*ood), isolating the
point-independent coeff*ood term (computed once, shared between both
points) and letting base-field columns use the cheap IsSubFieldOf
asymmetric multiply instead of a full extension-field product.
Follow-up: dedupes the z*g^k ladder walk within a query (was computed
twice, once per evaluation point), hand-rolls the 2-element
composition-tail batch inverse to avoid a Vec allocation, and marks the
IsSubFieldOf<Degree{2,3}GoldilocksExtensionField> mul/add/sub impls
#[inline(always)] to match their sibling IsField impls.
Trims a doc-comment sentence that duplicated an inline comment.

Measured impact

Multi-query recursion profile (make test-profile-recursion-multi; the
underlying proof has ~40-45K cycles of run-to-run variance from
stark::grinding::generate_nonce's parallel find_any search, so all
deltas below are 2-sample-cluster-to-cluster, not point-to-point):

Pre-fusion baseline: 2,210,366,539 cycles
After the fusion (85fa5b3e): ~1,947,242,710 cycles (-11.9%)
After the follow-up micro-opts (3ac5e494): ~1,939,857,910 cycles
(a further -0.38%, -14.2% total from baseline)

Single-query and step-3 (FRI) improvements are proportionally similar; see
individual commit messages for per-commit numbers.

Soundness

Every guard the two-call version had (base/aux-length checks, the
trace_term_coeffs sanity check, batch-inverse zero-rejection, the
composition-poly-parts-count check) is reproduced exactly or made strictly
stricter in the fused version — verified by independent adversarial review
agents against the reject-path semantics, the algebraic identity, and the
soundness threat model (a dishonest prover cannot craft a proof this code
accepts that the prior code would have rejected).

Test plan

make test (workspace-wide; only failure is math-cuda, which needs
a GPU unavailable in this environment — pre-existing, unrelated)
make test-ethrex
cargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture (in-VM end-to-end verify, output digest
unchanged across all three commits)
Two independent adversarial review passes (soundness + correctness,
then a second round covering performance/simplicity) on the diff

Reconstructing the deep composition polynomial per query walked the OOD table, the trace-term coefficients, and inverted the trace/comp denominators independently for the regular and symmetric evaluation points, even though the OOD-derived terms don't depend on the point. Rewriting coeff*(base-ood)*denom as denom*(coeff*base - coeff*ood) isolates the point-independent coeff*ood term (computed once, shared between both points) from coeff*base (cheap base*ext multiply for base-field columns), and lets both denom sets batch-invert together. Multi-query recursion profile: 2,210,366,539 -> 1,951,764,531 cycles (-11.7%); step 3 (FRI) 912M -> 652M cycles (-28.5%).

The doc comment on reconstruct_deep_composition_poly_evaluation_pair restated the inline comment three lines into the function body.

reconstruct_deep_composition_poly_evaluation_pair walked the z*g^k ladder twice per query (once per evaluation point) though it doesn't depend on which point is being evaluated; interleave the two denominator pushes into one walk instead. The 2-element composition- tail batch inverse also went through the general Vec-allocating inplace_batch_inverse; hand-roll it (one inversion, three muls, no allocation). Also mark the IsSubFieldOf<Degree{2,3}GoldilocksExtensionField> mul/ add/sub impls for GoldilocksField #[inline(always)], matching the sibling IsField impls on the same types — these are exactly the cheap asymmetric ops the deep-composition reconstruction fusion relies on. Multi-query recursion profile (two make-rebuilt runs per side, since the underlying proof has run-to-run variance): baseline 1,947,221,680 / 1,947,263,739 cycles; with this change 1,939,869,854 / 1,939,845,966 (-~7.37M, -0.38%). No correctness change (in-VM verify output digest identical to the prior commit).

Oppen added 3 commits July 3, 2026 15:15

docs(stark): drop duplicate IsSubFieldOf note from doc comment

07db8f0

The doc comment on reconstruct_deep_composition_poly_evaluation_pair restated the inline comment three lines into the function body.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(stark): fuse deep-composition reconstruction for both FRI points#775

perf(stark): fuse deep-composition reconstruction for both FRI points#775
Oppen wants to merge 3 commits into
perf/rkyv-serializationfrom
perf/deep-reconstruction

Oppen commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Oppen commented Jul 3, 2026

Summary

Measured impact

Soundness

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant