fix(bb/msm): stream-walker bucket_sums off-curve — exception-safe split-bucket combine by AztecBot · Pull Request #23740 · AztecProtocol/aztec-packages

AztecBot · 2026-05-30T03:32:34Z

Summary

Fixes a pre-existing correctness bug in the stream-walker MSM that returned
off-curve bucket_sums (and therefore wrong/garbage MSM results) on hot
buckets. A previous session (PR #23726) diagnosed the symptom as a
"non-deterministic torn-write race on bucket_sums". With a GPU-vs-@noble/curves
cross-check + per-bucket readback under headless SwiftShader, the actual root
cause is now pinned down and fixed.

Base: stream-walker-impl. No change to msm_v2.ts orchestration — only the
WGSL kernels (+ regenerated _generated/shaders.ts) and a dev cross-check
harness.

Root cause (proven, not guessed)

The walker's per-bucket partials are all correct and on-curve — the bug is
entirely downstream, in ba_walker_combine, which sums a bucket's partials with
plain affine point addition (dx = px − acc_x, then 1/dx). That formula
divides by zero whenever a running prefix sum equals ±(the next partial):

P == acc → a point doubling (needs the doubling slope 3x²/2y);
P == −acc → an intermediate point-at-infinity.

For a hot bucket (one bucket split across many tasks → dozens of partials),
the partials are walked in the CAS-insertion order of the per-bucket linked
list, which is non-deterministic across GPU runs. In a generic order at
least one prefix hits one of those exceptional cases, so the un-guarded affine
add produced off-curve garbage whose value changed run-to-run with the CAS
order — exactly the "non-deterministic torn write" the prior session observed.
On serial SwiftShader the bad order is fixed, so it reproduced deterministically;
the linked-list-order replay confirmed exactly one dx==0 per off-curve
bucket (LL_ORDER_DxZero=1, mostly the infinity case).

A second, latent bug was also found and fixed: partial_dest is allocated for
the host-max thread count (streamNumThreads, 8192) but only the dispatched
threads initialise their slots; the host clears the buffer to 0, and the old
encoding read 0 as bucket id 0, linking bogus (0,0) partials into bucket
0's combine list. (It happened to land on window 0's zero-digit column, which the
reduce drops, so it was silent in the final result but corrupted bucket_sums[0].)

The fix (WGSL only, in-scope = the bucket_sums path)

ba_walker_combine — exception-safe (complete) affine accumulation: detect
dx==0, branch to the doubling slope when P==acc, and track an explicit
identity flag for the P==−acc (infinity) case; a bucket that sums to identity
is written as (0,0) (the reduce already marks all-zero buckets not-present).
ba_stream_walker + ba_walker_partials_index — make partial_dest
1-indexed (bucket_id + 1, 0 = empty) so the host's clear-to-0 means
"empty" and over-allocated/un-dispatched slots can never be mistaken for bucket 0.
Regenerated wgsl/_generated/shaders.ts (node src/msm_webgpu/scripts/inline-wgsl.mjs).

Proof — repeated-green cross-check (headless SwiftShader, GPU vs @noble/curves)

dev/msm-webgpu/msm-correctness.* runs the full MsmV2 pipeline (incl. the
stream-walker) and checks, per run: final MSM == noble, every bucket_sums
on-curve, every per-window sum on-curve. Run as a sweep over seeds × reps
(re-running re-traverses the CAS list, so any surviving non-determinism shows up).

Input distribution	Configs	`bucket_sums` on-curve	full MSM == noble
Realistic random (Pᵢ = rᵢ·G)	8 seeds × logn{8,10} × 3 reps = 48	48/48	48/48
Hot buckets (64-scalar pool, ~16–64 pts/bucket)	3 seeds × logn{10,12} × 2 reps = 12	12/12	12/12
AP points (the harness that originally exposed the bug)	8 seeds × logn{8,10} × 2 reps = 32	32/32	30/32 (2 hit the reduce, see scope note)

Before the fix the same harness returned off-curve results at every size
(logn 8–16) with the mismatch set changing run-to-run. Full logs:
https://gist.github.com/AztecBot/96a1697838df66bf688f51906fe8e814

Every bucket_sums value is on-curve (off=0) in every configuration tested
— including the AP-points harness that originally exposed the bug and the
hot-bucket stress — and results are bit-identical across repeated runs (the
non-determinism is gone).

Scope note (out of scope: the shared affine reduce)

The same "affine add assumes no dx==0" pattern also exists, by explicit
design, in the shared ba_reduce_level_bench kernel (its header: "Point-equality
(P=±Q) handling is omitted — the algorithm assumes uniformly-random inputs with
no point collisions"). For realistic random/SRS-like points it never triggers
(100% green above). It can trigger only under deliberately structured inputs
(an arithmetic-progression point set, or ≤8 distinct buckets per window): in
those cases bucket_sums is still 100% correct (this PR), but the reduce can
emit one off-curve window. That is a separate, pre-existing limitation of the
reduce (used by the V2 pair-tree path too), outside this PR's stream-walker
bucket_sums scope. It can be fixed with the same complete-addition pattern,
branchlessly, by selecting 2·y_d as the batched denominator for doubling
candidates — happy to do that as a follow-up if wanted.

…it-bucket combine

fix(bb/msm): stream-walker bucket_sums off-curve — exception-safe spl…

17a47ec

…it-bucket combine

AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bb/msm): stream-walker bucket_sums off-curve — exception-safe split-bucket combine#23740

fix(bb/msm): stream-walker bucket_sums off-curve — exception-safe split-bucket combine#23740
AztecBot wants to merge 1 commit into
stream-walker-implfrom
cb/msm-walker-combine-correctness

AztecBot commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AztecBot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause (proven, not guessed)

The fix (WGSL only, in-scope = the bucket_sums path)

Proof — repeated-green cross-check (headless SwiftShader, GPU vs @noble/curves)

Scope note (out of scope: the shared affine reduce)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AztecBot commented May 30, 2026 •

edited

Loading