Skip to content

perf(stark): borrow trace rows in place for prover transition eval#767

Closed
MauroToscano wants to merge 1 commit into
feat/single-source-constraints-switchfrom
perf/row-frame-view
Closed

perf(stark): borrow trace rows in place for prover transition eval#767
MauroToscano wants to merge 1 commit into
feat/single-source-constraints-switchfrom
perf/row-frame-view

Conversation

@MauroToscano

Copy link
Copy Markdown
Contributor

Stacked on #764. Candidate from the perf investigation — bench in isolation before deciding.

What

The LDE buffers have been row-major since the row-major LDE rework, but the evaluator still gather-copied every main + aux column of every transition offset into an owned Frame on each LDE point (~150–200 element clones per row per table) before the constraint body ran.

This replaces the Prover context's frame with RowFrame: one borrowed (main, aux) row-slice pair per transition offset, taken straight from the row-major storage with the same cyclic row arithmetic. The ProverEvalFolder and the IR interpreter read rows[offset][col] directly. The per-thread preallocated Frame + fill_from_lde are deleted (single-row steps are the only shape since virtual columns were removed — asserted). Frame stays for the verifier/OOD path and debug validation (bridged via Frame::as_row_frame).

Reads the same values from the same memory — proofs are unchanged. Gates: stark 165/165 (incl. folder-vs-capture differentials), emit differentials 13/13, release e2e prove→verify, make lint.

How to bench

Isolated against the base branch (recommended — measures only this change):

scripts/bench_abba.sh origin/perf/row-frame-view origin/feat/single-source-constraints-switch 20

Or /bench-abba on this PR (measures vs main, i.e. this change + the whole switch branch).

The LDE buffers have been row-major since the row-major LDE rework, but
the evaluator still gather-copied every main and aux column of every
transition offset into an owned Frame on each LDE point (~150-200
element clones per row per table) before the constraint body ran.

Replace the Prover context's frame with RowFrame: one borrowed
(main, aux) row-slice pair per transition offset, taken straight from
the row-major storage with the same cyclic row arithmetic. The folder
and the IR interpreter read rows[offset][col] directly; the per-thread
preallocated Frame and fill_from_lde are deleted (single-row steps
only - the sole shape since virtual columns were removed; asserted).
Frame stays for the verifier/OOD path and debug validation, which
bridges via Frame::as_row_frame.

Reads the same values from the same memory - proofs are unchanged.
@MauroToscano

Copy link
Copy Markdown
Contributor Author

/bench-abba 32

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker started on the bench server (~30–40 min). The bench server is occupied until it finishes.

@MauroToscano

Copy link
Copy Markdown
Contributor Author

Folded into #764 as commit 3638d7a (cherry-picked onto the cleaned-up tip — the micro-op bundle this branch originally stacked on has been reverted there; this change is the one that measured a real win, ~0.8% on ABBA). Branch kept for provenance.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker — c1653d9a0d vs main (32 pairs)

❌ Run failed. Last log lines:

   pair  2/32   A=16.892s  B=16.660s   PR -1.39% (+=faster)
   pair  3/32   A=17.264s  B=16.499s   PR -4.64% (+=faster)
   pair  4/32   A=16.990s  B=16.816s   PR -1.03% (+=faster)
   pair  5/32   A=17.068s  B=16.940s   PR -0.76% (+=faster)
   pair  6/32   A=16.538s  B=16.868s   PR +1.96% (+=faster)
   pair  7/32   A=16.595s  B=16.515s   PR -0.48% (+=faster)
   pair  8/32   A=17.033s  B=16.727s   PR -1.83% (+=faster)
   pair  9/32   A=16.655s  B=16.418s   PR -1.44% (+=faster)
   pair 10/32   A=16.712s  B=16.796s   PR +0.50% (+=faster)
   pair 11/32   A=16.729s  B=16.855s   PR +0.75% (+=faster)
   pair 12/32   A=16.760s  B=16.703s   PR -0.34% (+=faster)
   pair 13/32   A=16.946s  B=16.587s   PR -2.16% (+=faster)
   pair 14/32   A=16.882s  B=16.776s   PR -0.63% (+=faster)
   pair 15/32   A=16.786s  B=16.968s   PR +1.07% (+=faster)
   pair 16/32   A=16.815s  B=16.790s   PR -0.15% (+=faster)
   pair 17/32   A=16.566s  B=16.597s   PR +0.19% (+=faster)
   pair 18/32   A=16.885s  B=16.773s   PR -0.67% (+=faster)
   pair 19/32   A=16.894s  B=16.800s   PR -0.56% (+=faster)
   pair 20/32   A=16.634s  B=16.419s   PR -1.31% (+=faster)
   pair 21/32   A=16.633s  B=16.807s   PR +1.04% (+=faster)
   pair 22/32   A=16.959s  B=16.693s   PR -1.59% (+=faster)
   pair 23/32   A=16.770s  B=17.010s   PR +1.41% (+=faster)
   pair 24/32   A=16.835s  B=17.021s   PR +1.09% (+=faster)
   pair 25/32   A=16.767s  B=16.619s   PR -0.89% (+=faster)
   pair 26/32   A=16.639s  B=16.678s   PR +0.23% (+=faster)
   pair 27/32   A=16.948s  B=16.718s   PR -1.38% (+=faster)
   pair 28/32   A=16.780s  B=16.699s   PR -0.49% (+=faster)
   pair 29/32   A=16.558s  B=16.654s   PR +0.58% (+=faster)
   pair 30/32   A=16.901s  B=16.978s   PR +0.45% (+=faster)
   pair 31/32   A=16.618s  B=16.875s   PR +1.52% (+=faster)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant