Skip to content

perf(rvf,rvm): HNSW query path, RaBitQ, contiguous slab, witness v2, mincut wiring + security hardening#555

Merged
ruvnet merged 12 commits into
mainfrom
claude/compassionate-volta-5gbbqj
Jun 12, 2026
Merged

perf(rvf,rvm): HNSW query path, RaBitQ, contiguous slab, witness v2, mincut wiring + security hardening#555
ruvnet merged 12 commits into
mainfrom
claude/compassionate-volta-5gbbqj

Conversation

@ruvnet

@ruvnet ruvnet commented Jun 12, 2026

Copy link
Copy Markdown
Owner

RVF (RuVector Format) and RVM optimization pass: deep review → SOTA research → implementation, with adversarial security audit and honest-claims corrections. 11 commits. The four library crates changed here are already published (rvf-types 0.2.1, rvf-index 0.2.0, rvf-quant 0.2.0, rvf-runtime 0.3.0).

RVF

  • HNSW wired into the runtime query pathRvfStore::query was a brute-force O(N·dim) scan; the entire rvf-index crate was unused by production queries and QualityEnvelope.evidence fabricated layer_a: true. Now: lazy build, incremental ingest, INDEX_SEG persistence (versioned backward-readable trailer), validate-or-rebuild on open, honest evidence. 21.7 ms → 1.51 ms per query at 100k×64 (14.4×), recall@10 0.968.
  • RaBitQ binary quantization (opt-in) — centroid centering, seeded randomized-Hadamard rotation, 1-bit codes + correction scalars, asymmetric estimator, two-stage oversample/rescore. Recall@10 0.972 at exactly 32× code compression. Plus Vamana α-pruning in HNSW neighbor selection (recall 0.986 → 0.996 at ef=30).
  • Contiguous vector slabHashMap<u64, Vec<f32>> → flat row-major slab with zero-copy slice accessors. Brute scan 24.5 ms → 3.8 ms (6.4×), cold open −21.5%.
  • Security hardening (from an adversarial audit): crafted-file DoS fixes in decode_payload/decode_index_seg/decode_sketch_seg/decode_product (validate-before-allocate, checked arithmetic incl. wasm32), non-blocking index rebuilds (no more O(N log N) build under the query mutex), +8 adversarial regression tests. Audit found no crypto or memory-safety breaks.
  • Earlier fixes: deterministic (distance, id) tie-breaking, crc32fast (~100× checksums), manifest tail-scan widening, quant codec round-trip, unified content-hash implementation.

RVM

  • Witness v2 — 96-byte records with 128-bit keyed-BLAKE3 chain links (was a folded 32-bit hash), one compression per append (~112 ns measured), Merkle segment sealing with signed roots + inclusion proofs (CT/QMDB-style). v1 logs still verify; the v1 head is cryptographically anchored into the first v2 record.
  • Mincut wired into split decisionsexecute_split now applies the computed cut boundary (was: empty child, cut ignored); Fennel O(deg) hot-path placement + exact Stoer-Wagner as a pressure-triggered epoch task; pressure+conductance split policy.
  • Honesty corrections: the "<10 µs partition switch" claim was a no-op bench (~6 ns) reported as "1600× faster than target" — now marked unimplemented with a canary test and corrected README; witness chain previously hashed only (prev_hash, sequence), now binds record content.

Verification

  • RVF: 1,271 passing on Windows (+38 net), 6 pre-existing env failures unchanged (verified identical on stashed baseline)
  • RVM: 875 passing (+75), zero failures
  • READMEs updated with environment-labeled measured numbers only

🤖 Generated with claude-flow

claude and others added 12 commits June 11, 2026 18:46
Equal-distance vectors were selected and ordered by HashMap iteration
order, which changes across process restarts and made query results
non-reproducible (flaky smoke_rvlite_adapter_persistence). Break ties
by vector id in both the top-k heap eviction and the final sort, in
query() and query_with_envelope().

https://claude.ai/code/session_01C83hbozEXPgoz9iJN5Smhp
…fest discovery

rvf-index:
- Cache SIMD distance-kernel dispatch in a OnceLock function-pointer
  table instead of re-running is_x86_feature_detected! on every call
- Rewrite HNSW search_layer with BinaryHeap min/max-heaps (was sorted
  Vec + O(n) mid-inserts) and a dense Vec<bool> visited bitmap (was a
  per-call SipHash HashSet); deterministic (distance, id) tie-breaking

rvf-runtime:
- Replace per-bit CRC32 loops with crc32fast (same IEEE polynomial,
  byte-identical hashes, ~100x faster) on segment write and verify
- Hoist cosine query-norm computation out of the per-vector scan loop
- Safety-net scan: single pass with HashSet membership (was
  O(k*N*neighbors) with Vec::contains)
- Bulk little-endian f32 serialization in write_vec_seg (one memcpy
  per vector instead of per-element appends)
- Progressively widen the manifest tail scan (64KB -> 1MB -> 16MB ->
  whole file): stores with large segment directories were becoming
  unreadable once the latest manifest fell outside the fixed 64KB
  window; with regression test

rvf-quant:
- encode_quant_seg now emits fully decodable payloads (delegates to
  the real scalar/product encoders; placeholders removed)
- decode_quant_seg returns Result instead of panicking on malformed
  or unknown-type payloads; round-trip and malformed-input tests

https://claude.ai/code/session_01C83hbozEXPgoz9iJN5Smhp
…ap, sched hot paths

rvm-witness (security-critical):
- The chain hash covered only (prev_hash, sequence) — record content
  (action, actor, target, payload, timestamp) was never hashed, so
  verify_chain accepted arbitrarily rewritten history. record_hash is
  now computed over the 44 content bytes (as its doc always claimed)
  and the chain binds it: H(prev || seq || record_hash). verify_chain
  recomputes content hashes; tamper-regression tests added.
- HMAC signer keys the Mac template once at construction instead of
  re-running the key schedule per record (fixed-vector test pins
  signature bytes)
- Witness ring overflow is now observable: total_overwritten counter
  and needs_drain() accessor

rvm-cap:
- Nonce replay window: colliding nonces (A + k*4096) could evict and
  re-admit nonce A. Replaced the two 32KB arrays with one 32KB
  open-addressed table (8-probe bounded); eviction raises the
  watermark so it fails closed. Regression test included.

rvm-coherence:
- internal_weight: O(MAX_EDGES) self-loop scan replaced with O(1)
  adj_matrix[i][i] read (invariant verified across all mutation paths)
- Skip ticks return a cached CoherenceDecision instead of re-running
  the O(n^2) merge-pair pass over stale data; zero-weight pairs skipped
- Mincut: scratch buffers moved into the long-lived bridge (~17KB less
  stack per call), in-place Stoer-Wagner (no working copy), bitmask
  membership, column-scan in-neighbors
- Compile-time guard: CoherenceGraph MAX_NODES > ADJ_DIM now fails to
  compile instead of panicking at the 33rd node; u64 weight deltas
  clamped at the engine boundary

rvm-coherence/rvm-partition:
- Single-slot hash indexes (id_to_node, edge_index) degraded to
  permanent O(N) scans after any collision; both now use bounded
  linear probing with tombstones and probe-proven absence

rvm-sched:
- enqueue() rejects the HYPERVISOR sentinel id, which previously
  wedged a run-queue slot permanently; defensive cleanup in
  switch_next

Tests: 733 workspace + 67 rvm-kernel lib pass (baseline 712); 23 new
tests including tamper-evidence and collision regressions.

https://claude.ai/code/session_01C83hbozEXPgoz9iJN5Smhp
RvfStore::query was a brute-force O(N*dim) scan; the rvf-index crate was
unused by production queries and QualityEnvelope.evidence fabricated
layer_a=true. The index is now built lazily on first eligible query,
maintained incrementally on ingest, persisted on close() via the existing
INDEX_SEG codec (with a versioned, backward-readable trailer for the
sparse-id mapping), and validated-or-rebuilt on open. Exact scan remains
for small stores (<1024), filtered/COW/membership queries, >25% deleted,
and force_exact; deterministic (distance, id) tie-breaking preserved on
both paths. evidence.layer_a is now set only when the index served the
query.

Measured: 21.7ms -> 1.51ms per query at 100k x 64-dim (criterion,
release), recall@10 = 0.968 at the ef_search=256 floor (>=0.95 gated by
test). +15 tests (recall, index persistence round-trip, evidence honesty,
fallback routing, compaction/overwrite invalidation).

Co-Authored-By: claude-flow <ruv@ruv.net>
…e sealing

v1 records folded chain links to 32 bits and left the head unanchorable.
The 96-byte v2 record embeds the predecessor MAC full-width and chains via
one keyed-BLAKE3 compression per append (~112ns measured, 9x under the 1us
target); keyed MACs detect last-record tampering and unkeyed forgery,
which v1 could not. Segment sealing accumulates record MACs into a
domain-separated Merkle tree (256/segment) sealed with one signature via
the existing signer infra (HMAC/dual-HMAC/Ed25519/TEE), with inclusion
proofs — expensive crypto moves off the per-record path and roots are
externally anchorable.

v1 logs still verify (version-byte dispatch; v1 only as prefix, head
anchored into the first v2 record); v1 writing is frozen. blake3 added as
pure-Rust no_std. +46 tests covering content/reorder/truncation/wrong-key
/forgery tamper modes, v1 compat, proofs, seals, and mixed logs.

Co-Authored-By: claude-flow <ruv@ruv.net>
…claim

execute_split previously created an empty child and ignored the computed
cut. It now resolves the boundary from a cached epoch SplitPlan (or
computes on demand) and re-homes move-side neighbors to the child with
their edge weights. Two-tier decisions: exact Stoer-Wagner mincut runs as
a pressure-triggered epoch task; a new Fennel placer (O(degree),
fixed-point gamma=1.5, no_std) handles hot-path placement. Split policy
combines pressure and cut quality: mid-band (8000-9500bp) splits only on
a cut with conductance <= 5000bp; critical pressure stays an
unconditional safety valve.

The sub-10us partition-switch claim was a stub certified by a no-op bench
(~6ns) reported as 1600x faster than target. The real path needs EL2
assembly the crate forbids; instead the measurable register save/restore
lower bound is implemented and benchmarked, the bench is renamed
partition_switch_validation_stub with an honesty gate, a canary test
fails if HARDWARE_SWITCH_IMPLEMENTED flips without revisiting the claim,
and the README row now reads: not validated. +30 tests.

Co-Authored-By: claude-flow <ruv@ruv.net>
rvf-quant gains a RaBitQ-style codec: global-centroid centering, 3-round
seeded randomized-Hadamard rotation (orthonormal, reproducible from a
stored u64 seed), 1-bit sign codes with per-vector norm/dot-correction
scalars, and an asymmetric full-precision-query estimator. QUANT_SEG adds
versioned type tag 4 (legacy payloads byte-frozen and still decode;
unknown versions rejected; decode stays panic-free on untrusted bytes).

Query path: opt-in two-stage search (QueryOptions::rabitq, default off) —
estimator scan with oversampling (640-candidate floor) then exact f32
rescore; deterministic (distance, id) tie-breaking; falls back to default
routing for filtered/COW/IP/cosine queries. Measured recall@10 = 0.972 vs
exact on 10k x 128 (gate >= 0.95, test-enforced); code-only compression
exactly 32x.

rvf-index: Vamana-style robust prune (alpha = 1.2, occluded backfill) at
insert and prune time; recall@10 at ef=30 improved 0.986 -> 0.996;
construction determinism preserved. +42 tests (1254 passing, no new
failures).

Co-Authored-By: claude-flow <ruv@ruv.net>
An adversarial audit confirmed a crafted .rvf could panic or OOM the
process on RvfStore::open(): unvalidated length fields drove
Vec::with_capacity before any byte-availability check. decode_payload now
bounds id_count by available delta bytes (u64 compare before the usize
cast, so 32-bit truncation cannot bypass it); decode_index_seg bounds
restart_count/layer_count/neighbor_count by remaining bytes and rejects
truncated restart padding (was a reachable slice panic); decode_sketch_seg
converts from assert-and-panic to Result with width/depth validated via
checked_mul (closes the width=0 + depth=u32::MAX bypass); decode_product
size products use checked u64 arithmetic so 32-bit (wasm32) targets cannot
wrap usize and read out of bounds. +8 adversarial regression tests.

Co-Authored-By: claude-flow <ruv@ruv.net>
…d hashing

Vector storage moves from HashMap<u64, Vec<f32>> to a contiguous row-major
slab (id->ordinal map, tombstoned deletes, slot reuse only via compaction);
HNSW/RaBitQ paths read rows as zero-copy slices and iteration is ordinal-
ordered (deterministic across restarts). Brute-force query at 100k x 64:
24.5ms -> 3.8ms (~6.4x). boot() pre-sizes the slab and bulk-copies VEC_SEG
payloads (no per-vector allocs): cold open 257ms -> 202ms (-21.5%). mmap
deferred (CRC verify touches all bytes anyway; memmap2 not in this
workspace) and documented as follow-up.

Audit finding 5: index/RaBitQ lazy builds now run with no lock held behind
an AtomicBool gate (panic-safe clear-on-drop); concurrent queries fall
back to exact scan and keep serving through the entire O(N log N) build.
Overwrite still invalidates and unlinks the stale INDEX_SEG.

Hashing: the two identical bespoke CRC32-rotation implementations in
write_path/read_path now delegate to one source of truth
(hashing::legacy_content_hash); on-disk bytes unchanged. Full rvf-wire
checksum-registry conformance (XXH3-128 + format-version bump +
dual-accept reader) documented as the remaining delta. read_path.rs also
carries the audit''s checked vec-seg size arithmetic. +11 tests; suite
1271 passing, no new failures (one pre-existing wall-clock bench
assertion flakes under load, passes in isolation).

Co-Authored-By: claude-flow <ruv@ruv.net>
…0.1.7, measured-benchmark READMEs

- rvf-types 0.2.0 -> 0.2.1 (QuantType::RaBitQ format extension)
- rvf-index 0.1.0 -> 0.2.0 (Vamana alpha-pruning, hardened INDEX_SEG codec)
- rvf-quant 0.1.0 -> 0.2.0 (RaBitQ codec; decode_sketch_seg now returns Result)
- rvf-runtime 0.2.0 -> 0.3.0 (HNSW query path, INDEX_SEG trailer, QueryOptions::rabitq, vector slab)
- dependent path-dep version reqs updated (cli, import, launch, node, server)
- @ruvector/rvf 0.2.0 -> 0.2.2, @ruvector/rvf-wasm 0.1.6 -> 0.1.7 (rebuilt wasm artifact, 1.89 toolchain + wasm-opt -Oz)
- READMEs: HNSW/RaBitQ/slab docs with measured numbers (Windows x64, criterion release, 100k x 64-dim); rvm witness v2 bench rows

Co-Authored-By: claude-flow <ruv@ruv.net>
The rvf-runtime 0.2 -> 0.3.0 version bump updated dependents inside the
rvf workspace but missed the root-workspace consumer: ruvector-robotics
pins version 0.2 alongside its path dep, which fails cargo resolution
against the bumped crate (PR #555 CI: failed to select a version for the
requirement rvf-runtime ^0.2). Root Cargo.lock refreshed.

Co-Authored-By: claude-flow <ruv@ruv.net>
@ruvnet ruvnet merged commit 3e84297 into main Jun 12, 2026
72 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants