perf(stark): borrow trace rows in place for prover transition eval by MauroToscano · Pull Request #767 · yetanotherco/lambda_vm

MauroToscano · 2026-07-02T19:39:01Z

Stacked on #764. Candidate from the perf investigation — bench in isolation before deciding.

What

The LDE buffers have been row-major since the row-major LDE rework, but the evaluator still gather-copied every main + aux column of every transition offset into an owned Frame on each LDE point (~150–200 element clones per row per table) before the constraint body ran.

This replaces the Prover context's frame with RowFrame: one borrowed (main, aux) row-slice pair per transition offset, taken straight from the row-major storage with the same cyclic row arithmetic. The ProverEvalFolder and the IR interpreter read rows[offset][col] directly. The per-thread preallocated Frame + fill_from_lde are deleted (single-row steps are the only shape since virtual columns were removed — asserted). Frame stays for the verifier/OOD path and debug validation (bridged via Frame::as_row_frame).

Reads the same values from the same memory — proofs are unchanged. Gates: stark 165/165 (incl. folder-vs-capture differentials), emit differentials 13/13, release e2e prove→verify, make lint.

How to bench

Isolated against the base branch (recommended — measures only this change):

scripts/bench_abba.sh origin/perf/row-frame-view origin/feat/single-source-constraints-switch 20

Or /bench-abba on this PR (measures vs main, i.e. this change + the whole switch branch).

The LDE buffers have been row-major since the row-major LDE rework, but the evaluator still gather-copied every main and aux column of every transition offset into an owned Frame on each LDE point (~150-200 element clones per row per table) before the constraint body ran. Replace the Prover context's frame with RowFrame: one borrowed (main, aux) row-slice pair per transition offset, taken straight from the row-major storage with the same cyclic row arithmetic. The folder and the IR interpreter read rows[offset][col] directly; the per-thread preallocated Frame and fill_from_lde are deleted (single-row steps only - the sole shape since virtual columns were removed; asserted). Frame stays for the verifier/OOD path and debug validation, which bridges via Frame::as_row_frame. Reads the same values from the same memory - proofs are unchanged.

MauroToscano · 2026-07-02T19:40:20Z

/bench-abba 32

github-actions · 2026-07-02T19:46:59Z

⏳ ABBA tiebreaker started on the bench server (~30–40 min). The bench server is occupied until it finishes.

MauroToscano · 2026-07-02T20:06:35Z

Folded into #764 as commit 3638d7a (cherry-picked onto the cleaned-up tip — the micro-op bundle this branch originally stacked on has been reverted there; this change is the one that measured a real win, ~0.8% on ABBA). Branch kept for provenance.

github-actions · 2026-07-02T20:06:51Z

ABBA tiebreaker — `c1653d9a0d` vs `main` (32 pairs)

❌ Run failed. Last log lines:

   pair  2/32   A=16.892s  B=16.660s   PR -1.39% (+=faster)
   pair  3/32   A=17.264s  B=16.499s   PR -4.64% (+=faster)
   pair  4/32   A=16.990s  B=16.816s   PR -1.03% (+=faster)
   pair  5/32   A=17.068s  B=16.940s   PR -0.76% (+=faster)
   pair  6/32   A=16.538s  B=16.868s   PR +1.96% (+=faster)
   pair  7/32   A=16.595s  B=16.515s   PR -0.48% (+=faster)
   pair  8/32   A=17.033s  B=16.727s   PR -1.83% (+=faster)
   pair  9/32   A=16.655s  B=16.418s   PR -1.44% (+=faster)
   pair 10/32   A=16.712s  B=16.796s   PR +0.50% (+=faster)
   pair 11/32   A=16.729s  B=16.855s   PR +0.75% (+=faster)
   pair 12/32   A=16.760s  B=16.703s   PR -0.34% (+=faster)
   pair 13/32   A=16.946s  B=16.587s   PR -2.16% (+=faster)
   pair 14/32   A=16.882s  B=16.776s   PR -0.63% (+=faster)
   pair 15/32   A=16.786s  B=16.968s   PR +1.07% (+=faster)
   pair 16/32   A=16.815s  B=16.790s   PR -0.15% (+=faster)
   pair 17/32   A=16.566s  B=16.597s   PR +0.19% (+=faster)
   pair 18/32   A=16.885s  B=16.773s   PR -0.67% (+=faster)
   pair 19/32   A=16.894s  B=16.800s   PR -0.56% (+=faster)
   pair 20/32   A=16.634s  B=16.419s   PR -1.31% (+=faster)
   pair 21/32   A=16.633s  B=16.807s   PR +1.04% (+=faster)
   pair 22/32   A=16.959s  B=16.693s   PR -1.59% (+=faster)
   pair 23/32   A=16.770s  B=17.010s   PR +1.41% (+=faster)
   pair 24/32   A=16.835s  B=17.021s   PR +1.09% (+=faster)
   pair 25/32   A=16.767s  B=16.619s   PR -0.89% (+=faster)
   pair 26/32   A=16.639s  B=16.678s   PR +0.23% (+=faster)
   pair 27/32   A=16.948s  B=16.718s   PR -1.38% (+=faster)
   pair 28/32   A=16.780s  B=16.699s   PR -0.49% (+=faster)
   pair 29/32   A=16.558s  B=16.654s   PR +0.58% (+=faster)
   pair 30/32   A=16.901s  B=16.978s   PR +0.45% (+=faster)
   pair 31/32   A=16.618s  B=16.875s   PR +1.52% (+=faster)

MauroToscano closed this Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(stark): borrow trace rows in place for prover transition eval#767

perf(stark): borrow trace rows in place for prover transition eval#767
MauroToscano wants to merge 1 commit into
feat/single-source-constraints-switchfrom
perf/row-frame-view

MauroToscano commented Jul 2, 2026

Uh oh!

MauroToscano commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

MauroToscano commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MauroToscano commented Jul 2, 2026

What

How to bench

Uh oh!

MauroToscano commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

MauroToscano commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

ABBA tiebreaker — c1653d9a0d vs main (32 pairs)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ABBA tiebreaker — `c1653d9a0d` vs `main` (32 pairs)