Skip to content

feat: cache verifying key + commitments (recursion opt)#727

Open
Oppen wants to merge 7 commits into
mainfrom
pr/vkey
Open

feat: cache verifying key + commitments (recursion opt)#727
Oppen wants to merge 7 commits into
mainfrom
pr/vkey

Conversation

@Oppen

@Oppen Oppen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Add VmVerifyingKey (prover/src/vkey.rs): a host-derived cache of the five preprocessed-table Merkle commitments (BITWISE, DECODE, REGISTER, KECCAK_RC, per-PAGE) plus the ProofOptions they were derived under. VmAirs::new_with_vkey / verify_with_options_with_vkey take the cached commitments instead of recomputing them — recomputation is ~87% of verifier cycles inside the recursion guest.

Soundness

The vkey is trusted input — Fiat-Shamir only catches a vkey inconsistent with the proof (post-hoc tampering), not a coordinated prover that commits to a forged preprocessed table and supplies a matching vkey. The binding is vk_digest = keccak(postcard(vkey)):

  • The prover stamps it into VmProof.vk_digest and it is bound into the Fiat-Shamir statement (LAMBDAVM_STARK_STATEMENT_V3).
  • The verifier recomputes the digest from its own vkey (caller-supplied, or derived from the trusted ELF) and rejects with an explicit InvalidVerifyingKey error before any STARK work. Malformed vkeys (wrong version, short pages vec, mismatched options) are rejected the same way instead of panicking.
  • The recursion guest commits vk_digest ‖ inner public output instead of [1], so the outer verifier checks which vkey was used in-guest against a digest derived from the trusted inner ELF — every guest input is prover-supplied, so this outer check is what makes the recursion sound. The smoke test performs it.
  • The digest covers ProofOptions because FRI query count and grinding factor affect soundness but influence no commitment; nothing else would pin them.

The recursion guest and the smoke test encode the vkey into the postcard blob (VmProof, elf, opts, vkey).

Breakdown


RECURSION GUEST PROFILE (blowup=2, 1 queries)

Total cycles : 114975927
Exec time : 2.648427084s

Per-step cycle breakdown (latest-marker state machine):
bucket cycles %
0. setup (alloc init + postcard decode) 21816885 18.98%

  1. airs_and_bus_balance (Elf::load/VmAirs::new preprocessed FFT+Merkle/bus balance) 6483454 5.64%
  2. multi_verify setup (transcript replay phase A/B, per-table fork) 130163 0.11%
  3. step 1: replay_rounds_after_round_1 37887686 32.95%
  4. step 2: verify_claimed_composition_polynomial 16500501 14.35%
  5. step 3: verify_fri 11676340 10.16%
  6. step 4: verify_trace_and_composition_openings (+ wrap-up) 20480898 17.81%

Unique PCs : 89622


RECURSION GUEST PROFILE (blowup=8, 73 queries)

Total cycles : 2968430174
Exec time : 91.293472417s

Per-step cycle breakdown (latest-marker state machine):
bucket cycles %
0. setup (alloc init + postcard decode) 698028452 23.52%

  1. airs_and_bus_balance (Elf::load/VmAirs::new preprocessed FFT+Merkle/bus balance) 6483460 0.22%
  2. multi_verify setup (transcript replay phase A/B, per-table fork) 130125 0.00%
  3. step 1: replay_rounds_after_round_1 51185741 1.72%
  4. step 2: verify_claimed_composition_polynomial 16504368 0.56%
  5. step 3: verify_fri 926553719 31.21%
  6. step 4: verify_trace_and_composition_openings (+ wrap-up) 1269544309 42.77%

Unique PCs : 89822


From baseline on #726:


bucket cycles %
  1. setup (alloc, pre-decode) | 21,923,793 | 0.42%
  2. decode → Elf::load/VmAirs::new/transcript setup | 5,129,300,006 | 97.93%
  3. step 1: replay_rounds_after_round_1 | 770,018 | 0.01%
  4. step 2: verify_claimed_composition_polynomial | 54,830 | 0.00%
  5. step 3: verify_fri | 1,782,977 | 0.03%
  6. step 4: verify_trace_and_composition_openings (+ wrap-up) | 84,104,530 | 1.61%

bucket cycles %
  1. setup (alloc, pre-decode) | 702,172,380 | 3.10%
  2. decode → Elf::load/VmAirs::new/transcript setup | 19,700,147,397 | 86.90%
  3. step 1: replay_rounds_after_round_1 | 1,282,633 | 0.01%
  4. step 2: verify_claimed_composition_polynomial | 54,462 | 0.00%
  5. step 3: verify_fri | 151,180,914 | 0.67%
  6. step 4: verify_trace_and_composition_openings (+ wrap-up) | 2,115,837,582 | 9.33%

While I left the combined step in the description, note 1. collapsed 1. and 2. in the current one and took 86.90% vs <25% and 97.93% vs <6%, and total cycles diminish by comparable amounts.

Validation

  • make test passes (only the 2 CUDA tests fail — no GPU); vkey_tests (10, incl. digest/short-pages/options-mismatch rejection) and statement_tests pass.
  • make test-profile-recursion passes and prints the histograms:
    • single (1 query): 115.0M cycles (vs 10.06B without vkey, ~87x)
    • multiquery: 2.97B cycles (vs 42.3B without vkey, ~14x)
  • The digest binding costs nothing measurable: 114.98M vs 115.27M (single) and 2.968B vs 2.976B (multi) before it landed.

Stacked PR — final of the stack; review on top of #726.

@Oppen

Oppen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

/ai-review

@github-actions

Copy link
Copy Markdown

Codex Code Review

Findings

  • Medium: prover/src/lib.rs indexes vk.pages[index] without validating the vkey shape. A malformed or stale serialized VmVerifyingKey with too few page entries will panic during verification for any ELF-data page at that index, instead of returning Err or Ok(false). This now matters because the recursion guest deserializes the vkey from private input and passes it into verification at bench_vs/lambda/recursion/src/main.rs. Validate vkey.version and vkey.pages.len() == page_configs.len() in verify_with_options_with_vkey before constructing AIRs, or use get(index) and reject cleanly.

No other safety, correctness, or significant performance issues found in the PR diff. Static review only; I did not build or run tests per the review constraints.

Comment thread prover/src/vkey.rs Outdated
Comment thread prover/src/lib.rs Outdated
@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review: cache verifying key + commitments (recursion opt)

The mechanical refactor is clean — the *_with_vkey variants thread an optional vkey through and the non-vkey paths stay byte-identical (they pass None). Page-config indexing alignment is handled carefully and precomputed_commitment_cached correctly mirrors the per-page choice in new_with_vkey. Two findings (inline):

1. High — Soundness rationale is incorrect. Sourcing the preprocessed-table commitments from a prover-supplied vkey defeats the verifier's critical soundness check (crypto/stark/src/verifier.rs:756-775), which is only sound because the verifier recomputes those commitments from trusted inputs. Fiat-Shamir prevents adaptive commitment swaps but not a coordinated prover that commits to a malicious preprocessed table from the start and supplies a matching vkey — challenges, openings and the equality check are then all consistent with the forged table, and a false statement verifies. This is only safe once the vkey digest is bound into the verified public statement (the deferred vk_digest in VmProof, checked by the outer verifier). Until then, don't use verify_with_options_with_vkey across a trust boundary, and please soften the doc/PR claims. The *_mismatch_rejects tests only cover post-hoc tampering, not a coordinated forge.

2. Medium — Panic on untrusted input. vk.pages[index] (prover/src/lib.rs) indexes a vec decoded from untrusted private input with no length check; a short pages vec panics inside the verifier. Use vk.pages.get(index).copied() and fall through to recompute (or validate pages.len() == page_configs.len() and reject).

Note: bench_vs/lambda/recursion/Cargo.lock churns many unrelated dependency versions downward (e.g. syn 2.0.118 -> 2.0.117, wasm-bindgen 0.2.126 -> 0.2.121). Likely an unintended lockfile regeneration on an older registry snapshot — worth confirming it's intended.

The production verify_with_options path is unaffected (defaults to vkey = None / recompute), so the risk is scoped to the experimental recursion guest. Given that scoping the perf win is real; the main ask is to fix the panic and correct the soundness framing before this is depended on for actual recursion soundness.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

AI Review

PR #727 · 13 changed files

Findings

Status Sev Location Finding Found by
candidate low prover/src/lib.rs:163 VmProof gains required vk_digest field — breaks old proof artifacts minimax
minimax/MiniMax-M3
candidate low prover/src/vkey.rs:121 VmVerifyingKey::from_elf_and_options register_init footgun minimax
minimax/MiniMax-M3
candidate low prover/src/vkey.rs:154 compute_digest uses allocating to_allocvec per verify minimax
minimax/MiniMax-M3

Status column reflects the verdict from the verifier: deepseek-verifier (openrouter/deepseek/deepseek-v4-pro).

AI-001: VmProof gains required vk_digest field — breaks old proof artifacts
  • Status: candidate
  • Severity: low
  • Location: prover/src/lib.rs:163
  • Found by: minimax:minimax/MiniMax-M3
  • Verified by: -
  • Rejected by: -

Claim

Adding the required vk_digest: [u8; 32] field to VmProof makes any proof serialized before this PR fail to deserialize (no default, no #[serde(default)]). This is a wire-format breaking change.

Evidence

VmProof struct in prover/src/lib.rs:160-183 now requires vk_digest. Minimal tests (prover/src/tests/prove_elfs_tests.rs:141) set it to [0u8; 32] with a comment noting it's never read, suggesting backward compat is intentionally not preserved on the minimal path either.

Suggested fix

If backward compatibility with archived proof bundles matters, mark the field with #[serde(default)] and treat [0u8; 32] as a sentinel for "no vkey bound" (skip the digest check in that case). Otherwise document the breaking change clearly in the commit message / changelog.

AI-002: VmVerifyingKey::from_elf_and_options register_init footgun
  • Status: candidate
  • Severity: low
  • Location: prover/src/vkey.rs:121
  • Found by: minimax:minimax/MiniMax-M3
  • Verified by: -
  • Rejected by: -

Claim

VmVerifyingKey::from_elf_and_options takes register_init: Option&lt;&amp;[u32]&gt; and embeds the preprocessed commitment derived from it as vkey.register. If a caller passes Some(custom_init) here but VmAirs::new_with_vkey is invoked with a different register_init (or default entry-point), the AIR's preprocessed commitment won't match vkey.register, and the verifier's no-per-slot-check on register means the mismatch is only caught by the vk_digest comparison — which would actually fail cleanly, but the source of the mismatch is non-obvious.

Evidence

prover/src/vkey.rs:121-149 uses register_init to compute register field; prover/src/lib.rs:660-666 in VmAirs::new_with_vkey uses its own register_init parameter independently. Today all call sites pass None for both, so the paths agree, but the API permits divergence.

Suggested fix

Either drop the register_init parameter from from_elf_and_options and always derive from elf.entry_point, or add an explicit assertion that the AIR-construction register_init matches what was used to compute vkey.register.

AI-003: compute_digest uses allocating to_allocvec per verify
  • Status: candidate
  • Severity: low
  • Location: prover/src/vkey.rs:154
  • Found by: minimax:minimax/MiniMax-M3
  • Verified by: -
  • Rejected by: -

Claim

VmVerifyingKey::compute_digest calls postcard::to_allocvec(self) per invocation. This is called once inside verify_with_options_with_vkey and once inside verify_recursion_commitment for every verify. For typical vkey sizes (~300 bytes) this is negligible, but it allocates a fresh Vec per call — and the recursion guest could trigger this on hot paths.

Evidence

prover/src/vkey.rs:154-160 uses postcard::to_allocvec(self).expect(...). Same allocation pattern fires inside prover/src/lib.rs:1249 (verifier) and prover/src/lib.rs:233 (outer verify_recursion_commitment).

Suggested fix

If measured to matter, replace with a streaming hash (postcard::to_stdvec or a Feed adapter into Keccak256); otherwise leave alone — sizes are small.

Reviewer Lanes

Lane Model Prompt Status Findings
glm openrouter/z-ai/glm-5.2 general error: opencode failed (provider/auth/runtime error) and no findings were submitted 0
kimi openrouter/moonshotai/kimi-k2.7-code general error: opencode failed (provider/auth/runtime error) and no findings were submitted 0
minimax minimax/MiniMax-M3 general success 3
moonmath zro/minimax-m3 general error: agentic lane timed out after 1800s 0
nemotron openrouter/nvidia/nemotron-3-ultra-550b-a55b general error: opencode failed (provider/auth/runtime error) and no findings were submitted 0

Verification Lanes

Lane Model Status Confirmed Rejected Uncertain
deepseek-verifier openrouter/deepseek/deepseek-v4-pro error: opencode failed (provider/auth/runtime error) and no verifications were submitted 0 0 0

Native Codex and Claude reviews run separately and post their own comments. They are not included in this structured provenance report.

Raw lane outputs, candidates, final issues, and model metrics are uploaded as workflow artifacts.

@Oppen Oppen force-pushed the pr/vkey branch 4 times, most recently from b8aa349 to 7590c91 Compare July 1, 2026 17:42
@Oppen Oppen force-pushed the pr/vkey branch 2 times, most recently from 71e9375 to 32319b1 Compare July 2, 2026 18:37
@Oppen Oppen marked this pull request as ready for review July 2, 2026 19:17
@Oppen

Oppen commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

/ai-review

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codex Code Review

Findings

  • Low: prover/src/tests/recursion_smoke_test.rs:523 still decodes the recursion blob as (VmProof, Vec<u8>, ProofOptions), but the PR changed the producer and guest to (VmProof, Vec<u8>, ProofOptions, VmVerifyingKey). This ignored host guard no longer matches “exactly as the guest does” and will fail or stop covering the actual guest decode contract when manually enabled. Update it to decode the 4-tuple and verify via verify_with_options_with_vkey.

No production safety/security issues found in the changed code from static review. I did not build or run tests per the review constraints.

@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review — vkey caching + digest binding

Reviewed the full diff (the PR branch isn't checked out in this sandbox; base main lacks these files, so this is diff-only).

Overall: solid. The soundness story is coherent and well-documented, and test coverage is thorough (per-table tamper rejection, short-pages, options-mismatch, wrong-version, round-trip, verify-equivalence). No safety or correctness issues found. Notes below are minor.

Soundness — checks out

  • vkey is trusted input; the real binding is vk_digest = keccak(postcard(vkey)) stamped into VmProof, bound into the FS statement (V3 tag), checked before STARK work, and committed as vk_digest ‖ output so the outer verifier compares against an ELF-derived digest. Recursion guest inputs are all prover-supplied, so that outer check is load-bearing — good that the smoke test exercises it.
  • Including ProofOptions in the digest is correct: query count / grinding factor affect soundness but pin no commitment. The explicit vk.options != *proof_options reject closes the weakened-options gap.
  • The None path (public verify_with_options) now derives the vkey from the trusted ELF and requires vm_proof.vk_digest to match — a strengthening; all prove paths stamp the digest, so no honest proof regresses.
  • Panic fix (vk.pages.get(index).copied() plus the length check) correctly turns a short pages vec into InvalidVerifyingKey instead of an out-of-bounds panic.

Minor

  1. Stale doc on VmAirs::new_with_vkey (prover/src/lib.rs): the comment says 'the bitwise preprocessed commitment is taken from it instead of being recomputed', but the function actually sources bitwise, decode, register, keccak_rc AND per-page commitments from the vkey. Update it to match the five-table behavior.
  2. Prover/verifier page-ordering coupling: the digest only matches if traces.page_configs (prover) and page_configs_from_elf_and_runtime (verifier) yield identical ordering. Both sort by page_base and tests cover it, so it's fine today — worth a one-line note near vkey.pages that ordering must stay in lockstep.
  3. The guest recomputes vkey.compute_digest() after verify already computed it internally; negligible vs. STARK verify, not worth changing.

Nothing blocking.

Comment thread prover/src/statement.rs Outdated
Base automatically changed from pr/profiling to main July 3, 2026 15:01
Oppen added 7 commits July 3, 2026 18:09
Add `VmVerifyingKey` (prover/src/vkey.rs): host-derived cache of the five
preprocessed-table Merkle commitments (BITWISE, DECODE, REGISTER,
KECCAK_RC, per-PAGE). `VmAirs::new_with_vkey` /
`verify_with_options_with_vkey` take the cached commitments instead of
recomputing them — recomputation is ~87% of verifier cycles inside the
recursion guest. Soundness is preserved by Fiat-Shamir.

The recursion and deserialize-only guests and the smoke test now encode
the vkey into the postcard blob `(VmProof, elf, opts, vkey)`.
A prover-supplied vkey defeats the preprocessed-commitment check:
Fiat-Shamir only catches post-hoc tampering, not a coordinated
prover committing to a forged table with a matching vkey. Bind
keccak(vkey) as vk_digest: embed ProofOptions in the vkey (query
count and grinding factor pin no commitment), stamp the digest into
VmProof and the statement transcript (V3), check it in verify
before STARK work, and commit vk_digest || inner output from the
recursion guest so the outer verifier can compare against a digest
derived from the trusted inner ELF.

Also validate vkey version/page-count instead of panicking on
short pages, and reject on options mismatch.
The comment should still say when it needs bumping, not why it was bumped the last time.
The host roundtrip test still decoded (VmProof, elf, opts); postcard
discards trailing bytes, so it silently skipped the vkey and the
vkey verify path the guest actually exercises.
@Oppen

Oppen commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

/ai-review

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codex Code Review

Findings

  • High: verify_with_options_with_vkey checks vm_proof.vk_digest == vkey.compute_digest(), but then still lets decode_commitment and page_commitments override the checked vkey when constructing AIRs. The actual commitments used by verification can therefore differ from the commitments covered by vk_digest. This weakens the new “vkey digest is program identity” guarantee for any caller that combines a vkey with explicit commitment hints. Current recursion guest passes None, but the public API permits the unsafe combination. Reject overrides when vkey.is_some(), or validate every override equals the corresponding vkey/canonical commitment before AIR construction. See lib.rs, lib.rs, lib.rs, lib.rs.

Notes

I reviewed statically only, per the sandbox instructions. No build or tests were run.

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Review: vkey + commitment caching (recursion opt)

I traced the full soundness path — the vk_digest Fiat-Shamir binding, the verify_with_options_with_vkey trusted-input model, the zero-init/private-input slot enforcement, and the outer verify_recursion_commitment check — plus the new tests. No significant safety, correctness, or performance issues found. This is careful, well-tested work.

Soundness (verified, no issues)

  • The trusted-input model is correct and clearly documented: with a caller-supplied vkey, DECODE/REGISTER/PAGE commitments are taken from the vkey unchecked, and soundness comes only from the outer verifier re-deriving the canonical vkey from the trusted inner ELF and comparing vk_digest. Direct callers get vkey = None, which derives from the trusted ELF as before.
  • The reclassification gap (zero-init/private-input slots are hashed into the digest but ignored by new_with_vkey) is closed by the explicit per-slot enforcement loop in verify_with_options_with_vkey, and test_vkey_zero_init_slot_enforced exercises exactly that gap by re-stamping vk_digest.
  • vk.pages[i] indexing is safe: guarded by the pages.len() == page_configs.len() check (covered by test_vkey_short_pages_rejects).
  • ProofOptions is bound three ways (explicit vk.options check, digest, and vm_proof.vk_digest match). Deriving Eq on it is safe — all fields are integers (no floats). security_bits isn't stored, but fri_number_of_queries + grinding_factor are, which is what actually pins soundness.
  • Statement DOMAIN_TAG correctly bumped V2V3 for the encoding change; old proofs won't cross-verify, which is intended.
  • The re-derivation in verify_recursion_commitment uses prover-supplied runtime_page_ranges/num_private_input_pages, but the trusted guest commits those directly from the in-guest–verified VmProof and the inner statement already binds them, so they can't be forged independently of a matching vk_digest.
  • No new panic surface: from_elf_and_options's .expect() on the decode commitment only runs on trusted ELF bytes (same as the pre-PR new() path). verify_vm_minimal doesn't absorb the statement, so the vk_digest: [0u8; 32] in minimal test proofs is genuinely never read — the comment is accurate.

Minor / optional (non-blocking)

  • Double ELF hash in the guest: main.rs computes elf_digest(&inner_elf) for the commitment, and verify_with_options_with_vkeyabsorb_statement hashes the same ELF again internally. Two full Keccak passes over the ELF inside the guest. Negligible against ~115M cycles, but since this PR is cycle-focused you could optionally have verify return/expose the digest it already computes. Not worth API churn on its own.

Nice work on the test coverage — the tamper-rejection matrix (bitwise/decode/register/keccak_rc/page/zero-init/options/version/short-pages) plus the forged-program and wrong-ELF recursion tests cover the interesting cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant