feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GLV design#23728
Draft
AztecBot wants to merge 4 commits into
Draft
feat(bb/msm): mobile-first MSM — SwiftShader correctness harness + GLV design#23728AztecBot wants to merge 4 commits into
AztecBot wants to merge 4 commits into
Conversation
Add BN254 GLV scalar decomposition (cuzk/glv.ts): splits each 254-bit scalar k into (k1, k2) with |ki| < 2^127 via Gauss-reduced lattice + Babai rounding, so an n-pair 254-bit MSM becomes a 2n-pair 128-bit MSM (Sigma k1 P + k2 phiP). Constants derived offline (Tonelli-Shanks cube roots, Gauss reduction), verified against the group law, and re-asserted at module load. MsmV2 gains a scalarBits config knob: numWindows = ceil(scalarBits / c), so halving the scalar bit length under GLV halves the window count T (and thus the bucket-reduction work and bucket_sums buffer). Default 254 preserves existing behaviour. Cross-check harness (dev/msm-webgpu/xcheck.*): WASM-free, network-free WebGPU MSM correctness oracle vs noble; runs under SwiftShader on a GPU-less host. GLV mode (?glv=1) validated PASS at logn=8,10.
Strictly additive: routes the WebGPU path through GLV decomposition (2n pairs, scalarBits=128) when ?glv=1 is set. The WASM/noble cross-check still validates the result (GLV output == original MSM), so a BrowserStack run yields on-device GLV correctness plus timing. Default runs unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mobile-first BN254 WebGPU MSM — GLV decomposition
Autonomous work toward a memory- and time-optimal BN254 MSM for laptop/mobile GPUs (Apple TBDR, Adreno, Mali) under the ≤100 MB algorithm-buffer budget and 16/32 KB workgroup-memory limits. Full design + budget:
barretenberg/ts/src/msm_webgpu/MOBILE_MSM_DESIGN.md.Algorithmic contribution: GLV endomorphism (
cuzk/glv.ts+scalarBitsknob)MSM_DESIGN_ANALYSIS.md§3.6 flags GLV as the top unexploited win ("wins everywhere", used by neither baseline). BN254 has φ(x,y)=(βx,y)=[λ]P; every scalar splits ask ≡ k₁ + λ·k₂ (mod r)with|kᵢ| < 2¹²⁷. So an n-pair, 254-bit MSM becomes a 2n-pair, 128-bit MSMΣ k₁ᵢPᵢ + Σ k₂ᵢφPᵢ. Constants derived offline (Tonelli-Shanks cube roots, Gauss lattice reduction), verified against the group law, re-asserted at module load.Total accumulation work is invariant (
2n·T′ ≈ n·Tnonzero digits), but halving the scalar bit length halves the window count T, which:T·2ᶜ, the n-independent term — 37 % of GPU wall @2¹⁶)T)n·T)(Buffer totals computed from the real allocation formulas in
ba_stream_plan.ts, holding thread count at the work-justified baseline.) Mobile-agnostic: φ is one Fq-multiply + a free coordinate copy, zero extra workgroup memory.Correctness — validated under SwiftShader (GPU-less host)
New WASM-free, network-free WebGPU correctness oracle vs noble (
dev/msm-webgpu/xcheck.*), run under SwiftShader:Commits
xcheck.{html,ts}+ driver)cuzk/glv.tsGLV decomposition +MsmV2.scalarBitsknobMOBILE_MSM_DESIGN.mddesign + memory/time budget?glv=1knob in the bench harness for on-device runsStatus / blockers (honest)
?glv=1wired into the bench autorun (the WASM/noble cross-check then also confirms GLV correctness on the real device). Reproduce on a free seat:node dev/msm-webgpu/scripts/run-browserstack.mjs --target macos --n 17 --autorun msm-benchand the same with&glv=1appended to the page URL.performance.memoryis JS heap). The credible memory evidence is therefore the analytical buffer budget above, derived from the real allocation formulas.φpoint-fetch to avoid doubling the input SRS — see design doc §4/§6.