feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GLV design by AztecBot · Pull Request #23728 · AztecProtocol/aztec-packages

AztecBot · 2026-05-30T00:07:51Z

Mobile-first BN254 WebGPU MSM — GLV decomposition

Autonomous work toward a memory- and time-optimal BN254 MSM for laptop/mobile GPUs (Apple TBDR, Adreno, Mali) under the ≤100 MB algorithm-buffer budget and 16/32 KB workgroup-memory limits. Full design + budget: barretenberg/ts/src/msm_webgpu/MOBILE_MSM_DESIGN.md.

Algorithmic contribution: GLV endomorphism (`cuzk/glv.ts` + `scalarBits` knob)

MSM_DESIGN_ANALYSIS.md §3.6 flags GLV as the top unexploited win ("wins everywhere", used by neither baseline). BN254 has φ(x,y)=(βx,y)=[λ]P; every scalar splits as k ≡ k₁ + λ·k₂ (mod r) with |kᵢ| < 2¹²⁷. So an n-pair, 254-bit MSM becomes a 2n-pair, 128-bit MSM Σ k₁ᵢPᵢ + Σ k₂ᵢφPᵢ. Constants derived offline (Tonelli-Shanks cube roots, Gauss lattice reduction), verified against the group law, re-asserted at module load.

Total accumulation work is invariant (2n·T′ ≈ n·T nonzero digits), but halving the scalar bit length halves the window count T, which:

	baseline	GLV	effect
BPR + transpose-scan (`T·2ᶜ`, the n-independent term — 37 % of GPU wall @2¹⁶)	T	T/2	time −, halved
Horner (`T`)	T	T/2	time −
accumulation (`n·T`)	n·T	n·T	flat (no regression)
algorithm buffers @2¹⁷	12.52 MB	8.02 MB	−36 %
algorithm buffers @2²⁰	43.33 MB	29.56 MB	−32 %

(Buffer totals computed from the real allocation formulas in ba_stream_plan.ts, holding thread count at the work-justified baseline.) Mobile-agnostic: φ is one Fq-multiply + a free coordinate copy, zero extra workgroup memory.

Correctness — validated under SwiftShader (GPU-less host)

New WASM-free, network-free WebGPU correctness oracle vs noble (dev/msm-webgpu/xcheck.*), run under SwiftShader:

baseline    logN=8 PASS (c=4)   logN=10 PASS (c=8)
GLV (128b)  logN=8 PASS (c=5)   logN=10 PASS (c=8)   max |kᵢ|=126 bits

Commits

SwiftShader correctness harness (xcheck.{html,ts} + driver)
cuzk/glv.ts GLV decomposition + MsmV2.scalarBits knob
MOBILE_MSM_DESIGN.md design + memory/time budget
?glv=1 knob in the bench harness for on-device runs

Status / blockers (honest)

No local GPU — correctness validated only under SwiftShader at logn 8/10 (per task constraints).
BrowserStack: blocked. Both seats (2, shared across ~10 agents) were occupied throughout this session; never interfered with running jobs. The on-device path is turnkey: cloudflared installed, ?glv=1 wired into the bench autorun (the WASM/noble cross-check then also confirms GLV correctness on the real device). Reproduce on a free seat:
node dev/msm-webgpu/scripts/run-browserstack.mjs --target macos --n 17 --autorun msm-bench and the same with &glv=1 appended to the page URL.
Peak GPU memory is not measurable via WebGPU (no API; performance.memory is JS heap). The credible memory evidence is therefore the analytical buffer budget above, derived from the real allocation formulas.
Designed, not yet wired (full memory-optimality): work-invariant thread allocation under GLV, and on-the-fly φ point-fetch to avoid doubling the input SRS — see design doc §4/§6.

…V design

Add BN254 GLV scalar decomposition (cuzk/glv.ts): splits each 254-bit scalar k into (k1, k2) with |ki| < 2^127 via Gauss-reduced lattice + Babai rounding, so an n-pair 254-bit MSM becomes a 2n-pair 128-bit MSM (Sigma k1 P + k2 phiP). Constants derived offline (Tonelli-Shanks cube roots, Gauss reduction), verified against the group law, and re-asserted at module load. MsmV2 gains a scalarBits config knob: numWindows = ceil(scalarBits / c), so halving the scalar bit length under GLV halves the window count T (and thus the bucket-reduction work and bucket_sums buffer). Default 254 preserves existing behaviour. Cross-check harness (dev/msm-webgpu/xcheck.*): WASM-free, network-free WebGPU MSM correctness oracle vs noble; runs under SwiftShader on a GPU-less host. GLV mode (?glv=1) validated PASS at logn=8,10.

…dation status

Strictly additive: routes the WebGPU path through GLV decomposition (2n pairs, scalarBits=128) when ?glv=1 is set. The WASM/noble cross-check still validates the result (GLV output == original MSM), so a BrowserStack run yields on-device GLV correctness plus timing. Default runs unaffected.

feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GL…

28accea

…V design

AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 30, 2026

AztecBot added 3 commits May 30, 2026 00:14

docs(bb/msm): mobile-first MSM design — GLV memory/time budget + vali…

78eee50

…dation status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GLV design#23728

feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GLV design#23728
AztecBot wants to merge 4 commits into
stream-walker-implfrom
cb/msm-opt/glv-mobile-7f3a

AztecBot commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AztecBot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mobile-first BN254 WebGPU MSM — GLV decomposition

Algorithmic contribution: GLV endomorphism (cuzk/glv.ts + scalarBits knob)

Correctness — validated under SwiftShader (GPU-less host)

Commits

Status / blockers (honest)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AztecBot commented May 30, 2026 •

edited

Loading

Algorithmic contribution: GLV endomorphism (`cuzk/glv.ts` + `scalarBits` knob)