Skip to content

fix: add importAsync() and rebuildHnswIndex() to fix recall() after import()#318

Open
luminexo wants to merge 2416 commits intoruvnet:mainfrom
luminexo:fix/import-rebuild-hnsw
Open

fix: add importAsync() and rebuildHnswIndex() to fix recall() after import()#318
luminexo wants to merge 2416 commits intoruvnet:mainfrom
luminexo:fix/import-rebuild-hnsw

Conversation

@luminexo
Copy link
Copy Markdown

Summary

Fixes #315

After calling engine.import(data), engine.recall() always returns [] even though memories are present in this.memories. The HNSW vector index is not rebuilt during import, and the brute-force fallback never triggers because empty results don't throw errors.

Changes

  1. Add importAsync(data, merge) - Async version that calls import() + rebuildHnswIndex()
  2. Add rebuildHnswIndex() - Re-inserts all memories into HNSW index
  3. Fix recall() - Fall through to brute-force when HNSW returns empty results (not just on errors)

Root Cause

import() populates this.memories but never calls vectorDb.insert(). After import, recall() calls vectorDb.search(), receives empty array, and returns it immediately. The catch block only triggers on thrown errors, not on empty results.

Usage

Test Plan

  • Import data with importAsync() and verify recall() returns results
  • Import data with import() + rebuildHnswIndex() and verify recall() returns results
  • Verify brute-force fallback works when HNSW is empty

Fixes: #315

ruvnet and others added 30 commits March 12, 2026 10:21
…arch (ruvnet#255)

* docs(research): add ultra-low-bit quantization & edge deployment research

Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:

- 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
- 02: Quantization-aware training (QAT) with reasoning preservation
- 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
- 04: MoE memory-aware routing for edge SRAM budgets
- 05: ruvLLM quantization architecture deep review and gap analysis
- 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
- 07: Novel 3-int pi-constant quantization using irrational scaling

Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

* docs(adr): add ADR-090 ultra-low-bit QAT & pi-quantization DDD architecture

Comprehensive architecture decision record for implementing 2-bit/3-bit
quantization-aware training in ruvLLM using Domain-Driven Design:

- 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability
- Pi-constant quantization with irrational scaling (pi/k step sizes)
- QAT training loop with STE variants and LoRA-QAT lightweight path
- QuIP incoherence via fast Walsh-Hadamard (O(n log n))
- Memory-aware MoE routing with expert precision allocation
- WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern
- Security: weight integrity, GGUF validation, WASM sandbox
- Benchmarking: criterion suite with throughput/quality targets
- 14-week timeline, maps to 18 existing files for extension

Placed in docs/adr/ddd/ per DDD architectural pattern organization.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

---------

Co-authored-by: Claude <noreply@anthropic.com>
Add comprehensive design doc for INT8 quantization implementation
in ruvector-cnn, including calibration strategies and SIMD optimization.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit aee77ba

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Formalizes INT8 quantization for ruvector-cnn with DDD bounded contexts:
- Quantization Core: params, tensors, scale computation
- Calibration: statistics, histograms, MinMax/Percentile methods
- Inference: QuantizedConv2d, fused BatchNorm, INT8 ReLU
- SIMD Kernels: AVX2, NEON, WASM INT8 implementations
- Observability: benchmarks, accuracy validation

Targets 2-4x speedup over FP32 with <1% accuracy loss.
Related to ADR-090 (ultra-low-bit QAT for LLMs).

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit b1ab65d

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit e683eb4

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
…, add implementation checklists

ADR-090 (Ultra-Low-Bit QAT):
- Changed status to "Accepted (Staged Implementation)"
- Added decision statement choosing LoRA-QAT as first path
- Added staged implementation phases (4 phases, explicit gates)
- Added validation plan defining "better" (MSE, spectral, cosine, outlier retention)
- Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context)
- Added system invariants (INV-1 through INV-8)
- Added acceptance gates (G1-G6) with rollback triggers
- Restructured success criteria into correctness/performance/quality/rollout

ADR-091 (INT8 CNN Quantization):
- Changed status to "Accepted"
- Added decision statement with acceptance benchmark
- Added system invariants (INV-1 through INV-8)
- Added operator coverage table (11 operators)
- Added graph rewrite passes section (4 passes)
- Added deployment policy matrix
- Added acceptance gates (7 gates) with rollback conditions

ADR-092 (MoE Memory-Aware Routing):
- Split from ADR-090 as routing affects scheduling/cache, not representation
- Added decision statement with acceptance benchmark (≥70% cache hit rate)
- Added system invariants (INV-1 through INV-6)
- Added acceptance gates (G1-G5) with rollback conditions
- Added domain analysis with bounded context

Implementation Checklists:
- ADR-090: 6 phases, ~28 files, 16 new + 12 extended
- ADR-091: 6 phases, acceptance gate verification commands

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 6c1a674

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Phase 1-4 implementation of ADR-090 with 114 tests passing.

## Core Quantization (src/quantize/)
- pi_quant.rs: PiQuantizer with π/k step sizes, Pi3BitBlock, Pi2BitBlock
- pi_quant_simd.rs: NEON/AVX2/scalar dequantization kernels (2.1x speedup)
- hadamard.rs: Fast Walsh-Hadamard O(n log n), INV-4 orthogonality verified
- incoherence.rs: IncoherenceTransform for QuIP-style decorrelation
- quip.rs: Q2_QuIP variant combining incoherence + 2-bit K-quant
- security.rs: WeightIntegrity, GGUF validation, bounds checking

## QAT Infrastructure (src/qat/)
- config.rs: QatConfig, SteVariant, QuantGranularity with builder pattern
- ste.rs: Straight-through estimator (Standard, Clipped, LSQ, EWGS)
- differentiable_quant.rs: DifferentiableQuantizer trait, PiQuantDifferentiable
- calibration.rs: CalibrationEngine with mixed-domain support
- distillation.rs: Teacher-student composite loss (L_task + L_KD + L_reasoning)
- reasoning_loss.rs: Chain-of-thought fidelity preservation
- training_loop.rs: QatTrainer orchestrator with checkpointing
- lora_qat.rs: Memory-efficient LoRA-QAT (50 MB vs 114 GB for full QAT)

## WASM Integration (ruvllm-wasm/)
- pi_quant_wasm.rs: PiQuantWasm with SIMD128 kernel, JSON serialization
- quant_bench_wasm.rs: QuantBenchWasm for in-browser benchmarking
- Feature flags: pi-quant, qat

## Tests (114 passing)
- pi_quant_tests.rs (35): Round-trip, block packing, bounds checking
- hadamard_tests.rs (23): Orthogonality, invertibility, energy preservation
- ste_tests.rs (24): Gradient correctness, PyTorch reference comparison
- simd_equivalence_tests.rs (19): SIMD ≈ scalar within 1 ULP (INV-8)
- acceptance_gates.rs (13): G1-G5 quality and security gates

## Benchmarks (benches/pi_quant_bench.rs)
- Hadamard 4096: 5.3 μs (target <50 μs) ✓
- NEON dequant: 2.54 GiB/s (2.1x over scalar)
- QAT backward: 7.3 Gelem/s

## Invariants Verified
- INV-1: STE gradient flow
- INV-2: Scale positivity (α > 0)
- INV-3: Step size constraint (π/k)
- INV-4: Hadamard orthogonality
- INV-5: Calibration provenance
- INV-8: SIMD ≈ scalar (≤1 ULP)

Co-Authored-By: claude-flow <ruv@ruv.net>
- Add AVX-512 dequantization kernel (16-wide SIMD, target >12 GB/s)
- Add AVX2 quantization kernel (8-wide SIMD) for forward pass
- Add AVX2 2-bit quantization kernel
- Optimize NEON kernel with prefetching and 8-group batching
- Add inline assembly prefetch (prfm pldl1keep)
- Update benchmarks with new throughput tests
- All 77 tests pass (pi_quant: 35, simd_equivalence: 19, hadamard: 23)

Performance optimizations target ADR-090 requirements:
- Quantize throughput: >1 GB/s (was 467 MiB/s)
- NEON dequant: >10 GB/s (was 2.54 GiB/s)
- AVX-512 dequant: >12 GB/s (new)

Co-Authored-By: claude-flow <ruv@ruv.net>
Remove neon_process_4_groups_ultra() which was superseded by the
optimized 8-group batching implementation with prefetching.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 42c51ac

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Complete implementation of INT8 quantization for ruvector-cnn:

Phase 1 - Core Infrastructure:
- QuantizationParams, QuantizationScheme, QuantizationMode
- QuantizedTensor<i8> with quantize/dequantize methods
- CalibrationMethod (MinMax, Percentile, MSE, Entropy)
- 34 unit tests passing

Phase 2 - INT8 Kernels:
- Scalar reference: conv2d, depthwise_conv2d, matmul, requantize
- AVX2 SIMD: _mm256_maddubs_epi16 for 2-4x speedup
- ARM NEON: vmull_s8, vpadalq_s16 for 2-3x speedup
- WASM SIMD128: i8x16 operations for 1.5-2x speedup

Phase 3 - Graph Rewrite Passes:
- GR-1: BatchNorm fusion into Conv weights
- GR-2: Zero-point correction pre-computation
- GR-3: Q/DQ node insertion at FP32/INT8 boundaries
- GR-4: ReLU/HardSwish fusion with LUT

Phase 4 - Quantized Layers:
- QuantizedConv2d with per-channel quantization
- QuantizedDepthwiseConv2d for MobileNet
- QuantizedLinear for FC layers
- QuantizedMaxPool2d/AvgPool2d
- QuantizedResidualAdd with scale alignment

Phase 6 - Tests & Benchmarks:
- quality_validation.rs: cosine similarity ≥0.995
- acceptance_gates.rs: 7 ADR-091 gates
- kernel_equivalence.rs: SIMD vs scalar validation
- int8_bench.rs: Criterion benchmarks

Performance targets:
- 2.5x latency improvement (MobileNetV3)
- 4x memory reduction
- <1% accuracy degradation

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit c39bf72

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Implements memory-aware expert routing with cache residency bonus:

## New moe/ Module (5 files, ~4,300 lines)
- router.rs: MemoryAwareRouter with cache bonus (0.15 default)
  - INV-6 compliant (deterministic tie-breaking)
  - PagingRequest generation for non-resident experts
- affinity.rs: EMA-based expert affinity tracking
  - INV-2 compliant (monotonic decay without activation)
  - top_k_by_affinity() for prefetch predictions
- precision_allocator.rs: Hot/warm/cold precision assignment
  - Frequency-based percentile thresholds
  - GGUF format mapping (Q4_K_M, Q3_K, Q2_K)
- sram_mapper.rs: Hardware memory hierarchy config
  - Presets: RPi5, Mobile, Desktop, WasmBrowser
  - Tier assignment (SRAM/DRAM/Storage)
- metrics.rs: MoE routing metrics tracking
  - Cache hit rate, paging latency, prefetch accuracy

## Extended bitnet/expert_cache.rs
- suggest_eviction_with_affinity(): Combined LRU/LFU + affinity
- prefetch_by_affinity(): Affinity-based expert prefetching
- hot_experts(): List currently cached experts

## Tests (131 total)
- 86 MoE unit tests
- 19 integration tests (GATE-1 through GATE-4 validation)
- 26 ExpertCache tests

## Benchmarks (9 suites)
- Routing overhead: ~22 ns (target: ≤15 μs) ✅
- Cache hit rate simulation
- Affinity update, precision allocation

Target: ≥70% cache hit rate vs 34% baseline

Co-Authored-By: claude-flow <ruv@ruv.net>
HIGH severity security fixes:
- router: Change new() from panic to Result<Self, &'static str>
- router: Change with_default_affinity() to return Result
- precision_allocator: Change new() to return Result, add new_unchecked()
- sram_mapper: Change assign_tier() from assert! to returning bool

MEDIUM severity security fixes:
- router: Add NaN/Inf validation in apply_cache_bonus_inplace()
- router: Handle NaN in select_top_k(), treat as NEG_INFINITY
- affinity: Add NaN handling in top_k_by_affinity() with deterministic tie-breaking
- affinity: Add NaN handling in least_affinity() for eviction decisions
- sram_mapper: Fix division by zero in priority_score() when last_access=0

P0 performance optimizations:
- router: Add apply_cache_bonus_inplace() to avoid allocation in hot path
- router: Use select_nth_unstable_by for partial sort when k << n (O(n) vs O(n log n))

All 103 tests pass (84 unit + 19 integration).

Co-Authored-By: claude-flow <ruv@ruv.net>
SIMD decay optimization (affinity.rs):
- Add decay_scores_simd() with platform-specific implementations
- NEON intrinsics for ARM64 (4-wide vectorization)
- AVX2 intrinsics for x86_64 (8-wide vectorization)
- Scalar fallback for other platforms
- Handles non-aligned sizes with remainder loop

Bitmask cache residency (router.rs):
- Replace Vec<bool> with CacheMask bitmask structure
- u64 for ≤64 experts (single word, cache-friendly)
- Vec<u64> bitvector for >64 experts (larger models)
- Efficient popcount for resident_list()
- O(1) is_set/set operations via bitwise ops

Edge case tests added:
- Non-aligned SIMD sizes (1, 3, 5, 7, 9, 15, 17, 33, 65 experts)
- Large expert counts (256 experts)
- SIMD vs scalar correctness verification
- CacheMask with >64 experts (128 experts)
- Out-of-bounds handling
- Empty cache state

All 92 unit tests + 19 integration tests pass.

Co-Authored-By: claude-flow <ruv@ruv.net>
P2: Buffer reuse optimizations
- Add reusable score_buffer and index_buffer to avoid hot-path allocations
- Add route_into_buffer() using pre-allocated buffers
- Add apply_cache_bonus_inplace_buffer() for in-place operations
- Add select_top_k_buffered() using pre-allocated index buffer
- Add route_batch() for efficient batch token routing
- Add bulk metric recording methods (record_cache_hits/record_cache_misses)

P3: Branch hints for hot paths
- Add #[inline] attributes to all hot path methods
- route(), route_into_buffer(), apply_cache_bonus_inplace_buffer()
- select_top_k_buffered(), select_top_2_unrolled(), is_set(), set()

P4: Loop unrolling for small arrays
- Add select_top_2_unrolled() for common top-2 MoE configuration
- Single pass through scores to find best and second-best
- Avoids sorting overhead for the most common case

Performance impact:
- P2: Eliminates Vec allocations in hot routing path
- P3: Reduces function call overhead via inlining
- P4: 2x faster top-2 selection vs full sort

All 93 MoE tests pass.

Co-Authored-By: claude-flow <ruv@ruv.net>
Add comprehensive benchmarks for memory-aware router optimizations:

- bench_memory_aware_router: Tests MemoryAwareRouter performance
  - route_top2: P4 unrolled top-2 selection benchmark
  - route_batch_8: P2 batch routing with buffer reuse
  - cache_mask_check_64/128: P1 bitmask lookup performance
  - select_top2_vs_sort: Compare unrolled vs sorted selection
  - select_top4_partial_sort: Partial sort for larger K

- bench_simd_affinity_decay: Tests SIMD decay performance
  - decay_all: P1 SIMD-optimized decay across expert counts
  - update_with_activation: Combined decay + boost performance

Validates ADR-092 targets:
- Routing overhead <= 15 us
- Cache hit rate >= 70%

Co-Authored-By: claude-flow <ruv@ruv.net>
- Bump workspace version from 2.0.5 to 2.0.6
- Update README with ADR-090 (Pi-Quantization) features
- Update README with ADR-091 (INT8 CNN Quantization) features
- Update README with ADR-092 (MoE Memory-Aware Routing) features
- Published ruvllm v2.0.6 and ruvector-cnn v2.0.6 to crates.io

Co-Authored-By: claude-flow <ruv@ruv.net>
Co-Authored-By: claude-flow <ruv@ruv.net>
The _mm512_roundscale_ps intrinsic requires a compile-time constant
for the rounding mode parameter. Changed from runtime let binding
to const to fix CI compilation on AVX-512 systems.

Co-Authored-By: claude-flow <ruv@ruv.net>
…re-routing

feat(adr-090-092): Pi-Quantization, INT8 CNN, MoE Memory-Aware Routing
  Built from commit 5a4edc1

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Built from commit 5a4edc1

Platforms: linux-x64-gnu, linux-arm64-gnu, darwin-x64, darwin-arm64, win32-x64-msvc

Co-Authored-By: claude-flow <ruv@ruv.net>
…rics

P0: Router buffer reuse optimization
- Add pre-allocated result_buffer to MemoryAwareRouter
- Eliminate collect() allocation in select_top_k_buffered()
- Use std::mem::take for zero-copy buffer handoff
- Expected savings: 1-2µs per routing call

P1: Optional routing metrics feature flag
- Add 'routing-metrics' feature (enabled by default)
- Conditionally compile Instant::now() and metrics tracking
- Allows production builds to avoid syscall overhead (~0.04-0.08µs)

Performance Analysis Documentation:
- MoE routing optimization analysis report
- Comprehensive architecture review (5 documents)
- Identifies 8 additional optimization opportunities

ADR-092 targets: <10µs routing latency, 70%+ cache hit rate
All 26 MoE router tests pass.

Co-Authored-By: claude-flow <ruv@ruv.net>
perf(ruvllm): Deep optimization for MoE routing and benchmark analysis
  Built from commit fd3048c

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
)

* feat(ruvix): implement ADR-087 RuVix Cognition Kernel Phase A

Implements the complete Phase A (Linux-hosted) RuVix Cognition Kernel
with 9 crates, 760 tests, and comprehensive documentation.

## Core Crates (9)
- ruvix-types: 6 kernel primitives (Task, Capability, Region, Queue, Timer, Proof)
- ruvix-cap: seL4-inspired capability management with derivation trees
- ruvix-region: Memory regions (Immutable, AppendOnly, Slab policies)
- ruvix-queue: io_uring-style lock-free IPC with zero-copy semantics
- ruvix-proof: 3-tier proof engine (Reflex <100ns, Standard <100us, Deep <10ms)
- ruvix-sched: Coherence-aware scheduler with priority computation
- ruvix-boot: 5-stage RVF boot loader with ML-DSA-65 signatures
- ruvix-vecgraph: Kernel-resident vector/graph stores with HNSW
- ruvix-nucleus: Unified kernel entry point with 12 syscalls

## Security (SEC-001, SEC-002)
- Boot signature failure: PANIC immediately, no fallback path
- Proof cache: 100ms TTL, single-use nonces, max 64 entries
- Capability delegation depth: max 8 levels with audit warnings

## Architecture
- no_std compatible for Phase B bare metal port
- Proof-gated mutation: every state change requires cryptographic proof
- Capability-based access control: no syscall without valid capability
- Zero-copy IPC via region descriptors (TOCTOU protected)

## Documentation
- Main README with architecture diagrams
- Individual crate READMEs with usage examples
- Architecture decision records

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: update ADR-087 status and add RuVix to root README

- Update ADR-087 status from Proposed to Accepted (Phase A Implemented)
- Add implementation status table with all 9 crates and 760 tests
- Document security invariants implemented (SEC-001 through SEC-004)
- Add collapsed RuVix section to root README with architecture diagram

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: update ruvector-coherence dependency to 2.0.4 for crates.io publish

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvix): implement ADR-087 Phase B bare metal AArch64 support

Phase B adds bare metal AArch64 support for the RuVix Cognition Kernel:

New crates:
- ruvix-hal: Hardware Abstraction Layer traits (~500 lines)
  - Console, InterruptController, Timer, Mmu, PowerManagement traits
  - Platform-agnostic design for ARM64/RISC-V/x86_64
  - 15 unit tests passing

- ruvix-aarch64: AArch64 boot and MMU support (~2,000 lines)
  - _start assembly entry, exception vectors
  - 4-level page tables with capability metadata
  - System register accessors (SCTLR_EL1, TCR_EL1, TTBR0/1)
  - Implements ruvix_hal::Mmu trait

- ruvix-drivers: Device drivers for QEMU virt (~1,500 lines)
  - PL011 UART driver (115200 8N1, FIFO, interrupts)
  - GIC-400 interrupt controller (256 IRQs, 16 priorities)
  - ARM Generic Timer (deadline scheduling)
  - Volatile MMIO with memory barriers (DMB, DSB, ISB)

Build infrastructure:
- aarch64-boot/ with linker script and custom Rust target
- QEMU virt runner integration (Cortex-A72, 128MB RAM)
- Makefile with build/run/debug targets

ADR-087 updated with:
- Phase B objectives and new crate specifications
- QEMU virt memory map (128MB RAM at 0x40000000)
- 5-stage boot sequence documentation
- Security enhancements and testing strategy
- Raspberry Pi 4/5 platform differences

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvix): implement Phases C/D/E and QEMU swarm simulation

This adds full bare metal OS capabilities to the RuVix Cognition Kernel:

## Phase C: Multi-Core & DMA Support
- ruvix-smp: Symmetric multi-processing (256 cores, spinlocks, IPIs)
- ruvix-dma: DMA controller with scatter-gather
- ruvix-dtb: Device tree blob parser
- ruvix-physmem: Buddy allocator for physical memory

## Phase D: Raspberry Pi 4/5 Support
- ruvix-bcm2711: BCM2711/2712 SoC drivers (GPIO, mailbox, UART)
- ruvix-rpi-boot: RPi boot support (spin table, early UART)

## Phase E: Networking & Filesystem
- ruvix-net: Full network stack (Ethernet/ARP/IPv4/UDP/ICMP)
- ruvix-fs: Filesystem layer (VFS, FAT32, RamFS)

## QEMU Swarm Simulation
- qemu-swarm: Multi-QEMU cluster for distributed testing
- Network topologies: mesh, ring, star, tree
- Fault injection and chaos testing scenarios

## Summary
- 10 new crates, ~27,000 lines of code
- 400+ new tests passing
- ADR-087 updated with Phases C/D/E documentation
- Main README updated with all phases

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(ruvix): address critical security vulnerabilities CVE-001 through CVE-005

Security fixes applied from deep review audit:

- CVE-001 (CRITICAL): Add compile-time protection preventing
  `disable-boot-verify` feature in release builds. This closes
  a boot signature bypass vulnerability.

- CVE-002 (HIGH): Add MMIO address validation to GIC driver.
  `Gic::new()` now returns `Result<Self, GicError>` and validates
  addresses against known platform ranges. Added `new_unchecked()`
  for trusted callers.

- CVE-003 (HIGH): Add integer overflow protection in DTB parser.
  All offset calculations now use `checked_add()` to prevent
  buffer overflow via crafted DTB files.

- CVE-005 (HIGH): Add IPv4 header validation ensuring
  `total_length >= header_len` per RFC 791.

Also includes test fixes:
- Mark hardware-dependent tests as `#[ignore]` (MMIO, ARM timer)
- Fix swap32 test assertion in rpi-boot
- Update doctests for new GIC API

All 259 tests pass across affected crates.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvix): implement CLI, kernel shell, and PBFT consensus

Implements Phase F features for the RuVix Cognition Kernel:

CLI (ruvix-cli):
- build: Cross-compile kernel for AArch64 targets
- config: Manage kernel configuration files
- dtb: Device tree blob operations (validate, dump, compile, compare, search)
- flash: UART/serial flash operations with progress reporting
- keys: Ed25519 key management with secure storage
- monitor: Real-time kernel metrics dashboard
- security: Security audit and vulnerability scanning

Kernel Shell (ruvix-shell):
- Interactive command parser with history support
- Commands: help, info, mem, tasks, caps, vectors, witness, proofs,
  queues, perf, cpu, trace, reboot
- Configurable prompt with trace mode indication
- Shell backend integration with nucleus kernel

PBFT Consensus (qemu-swarm):
- Full PBFT implementation (pre-prepare, prepare, commit phases)
- View change protocol for leader recovery
- Checkpoint mechanism for state synchronization
- Custom serde wrappers for fixed-size byte arrays (Signature, HashDigest)
- Byzantine fault tolerance (f < n/3)

Additional:
- Example RVF swarm consensus demo
- Nucleus shell backend for kernel introspection
- Fixed chrono DateTime type annotation in keys.rs

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore(ruvix): add version specs for crates.io publishing

- Add version = "0.1.0" to ruvix-dtb dependency in CLI
- Add README.md for ruvix-shell crate

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
ruvnet and others added 29 commits March 28, 2026 01:54
4-phase plan for retraining RuvLTRA models on GCloud:
- Phase 1: TurboQuant-calibrated GGUF quantization (imatrix recalibration)
- Phase 2: WET-augmented SFT + DPO fine-tuning on brain knowledge + Common Crawl
- Phase 3: Benchmarking suite (HumanEval, SWE-Bench, TurboQuant quality, latency)
- Phase 4: Publishing updated models to HuggingFace with -tq variants

Uses existing phi4-finetuning-gpu Cloud Run template, Vertex AI for
training, and brain-wet-daily pipeline for data. Estimated cost: ~$70.

Co-Authored-By: claude-flow <ruv@ruv.net>
Correct TurboQuant scope (runtime KV-cache only, not weight quant),
add Current Gaps section, document existing training infrastructure
(13 components), clarify LoRA-based fine-tuning approach, reference
related ADRs (049, 090, 093).

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 968fe21

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit ed93997

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
…blation

Addresses review feedback:
- Add dataset governance: record schema, source allowlist, dedup rules,
  eval contamination checks, quality scoring
- Add release gate: 7 ship/no-ship criteria (G1-G7) with automated
  release_gate.py checker
- Add ablation matrix: 5 runs (A-E) isolating imatrix, SFT, DPO, TQ
- Add rollback plan: HF git revert, registry rollback, npm patch
- Add TurboQuant serving plan: .turboquant.json sidecar config,
  runtime discovery, per-layer profiling
- Relabel cost estimate as "initial experimental compute only"
- Update status to "proposed, pending governance hardening"
- Expand next steps to 21 items across 4 phases

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit e265141

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Training tooling:
- release_gate.py: Automated 7-gate ship/no-ship checker (G1-G7)
- export_training_data.py: Dataset export with governance (schema,
  dedup, quality scoring, contamination check)
- contamination_check.py: 13-gram eval contamination detection
- run_calibration.py: Phase 1 imatrix + TurboQuant profiling
- run_sft.py: Phase 2 LoRA SFT + DPO training
- deploy_training.sh: Cloud Run job creation + Vertex AI setup
- Dockerfile: GPU training image (transformers + peft + trl)

Rust infrastructure:
- turboquant_profile.rs: .turboquant.json sidecar config loading,
  per-layer TQ config discovery, default profiles

Ref: ADR-129, ruvnet#310

Co-Authored-By: claude-flow <ruv@ruv.net>
- nightly_train.sh: 5-phase nightly pipeline (export brain learnings,
  contamination check, incremental LoRA, release gates, push to HF)
- Updated deploy_training.sh with nightly Cloud Run job + scheduler
- Updated ADR-129 with nightly continuous learning section

Schedule: daily 03:00 UTC, ~$4/day, skips if <10 new records.
All 7 release gates must pass before publishing.

Ref: ruvnet#310

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 063c838

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit 8289823

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
98 brain memories + 131 ADRs + 1 routing reference.
Governance: SHA-256 dedup, quality >= 0.5, schema validated.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit e6d4f50

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
GPU-enabled Cloud Run jobs have a maximum timeout of 1 hour.
The previous 7200s (2hr) setting was rejected by the API.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit b866cd5

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
- Rewrite run_calibration.py to use gguf Python package + llama-cpp-python
  prebuilt wheels instead of compiling llama.cpp from source
- Simplify Dockerfile: single-stage, pip install only, no CUDA compilation
  (build time: ~5min vs 20+min)
- Update ADR-129 with tooling decision section explaining ruvllm-native choice
- Remove llama-imatrix and llama-quantize binary dependencies

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 3dc7753

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
The pip install of llama-cpp-python from source requires ninja + cmake
for CUDA compilation. Use the prebuilt wheel from the cu124 index instead.
Falls back to source install, then transformers-only mode.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit f220d3b

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
- Add libgomp1 (required by llama-cpp-python OpenMP)
- Use PyTorch cu124 index for proper CUDA wheel
- Set default CMD with --model-id for Cloud Run execution
- Consolidate pip installs for Docker layer cache efficiency

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 63c68bc

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit 6f4b3d4

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Phase 1 calibration deployed and executed on GCloud L4 GPU.
Infrastructure: Docker image built (torch 2.5.1+cu124), 3 Cloud Run
jobs deployed, 2 schedulers enabled. Training corpus exported.
Release gate automation tested. TurboQuant sidecars on HuggingFace.

Co-Authored-By: claude-flow <ruv@ruv.net>
… models

Status: Accepted. ruvltra-small complete, 3 remaining models executing
on L4 GPU (ruvltra-medium, ruvltra-claude-code, ruvltra).

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit bab9f45

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit e7ad2af

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Calibration results (L4 GPU):
- ruvltra-small: 75.4 tok/s
- ruvltra-medium: 62.6 tok/s
- ruvltra-claude-code: 67.1 tok/s
- ruvltra: pending final execution

TQ profiles + benchmark_results.json uploaded to all HuggingFace models.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit 04ed5b8

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
…mport()

Issue: ruvnet#315 - import() populates this.memories but never updates HNSW index.
After engine.import(data), recall() always returns [] because HNSW search
returns empty results and brute-force fallback never triggers (empty array,
not error).

Changes:
- Add importAsync() that calls import() + rebuildHnswIndex()
- Add rebuildHnswIndex() to re-insert memories into HNSW
- Fix recall() to fall through to brute-force when HNSW returns empty results
- Document that import() is sync and doesn't rebuild HNSW

Fixes: ruvnet#315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: import() does not rebuild HNSW index — recall() always returns [] after load

3 participants