fix: add importAsync() and rebuildHnswIndex() to fix recall() after import()#318
Open
luminexo wants to merge 2416 commits intoruvnet:mainfrom
Open
fix: add importAsync() and rebuildHnswIndex() to fix recall() after import()#318luminexo wants to merge 2416 commits intoruvnet:mainfrom
luminexo wants to merge 2416 commits intoruvnet:mainfrom
Conversation
…arch (ruvnet#255) * docs(research): add ultra-low-bit quantization & edge deployment research Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM: - 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants) - 02: Quantization-aware training (QAT) with reasoning preservation - 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice) - 04: MoE memory-aware routing for edge SRAM budgets - 05: ruvLLM quantization architecture deep review and gap analysis - 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap) - 07: Novel 3-int pi-constant quantization using irrational scaling Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache) but needs QAT training loop and differentiable quantization primitives. Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit. https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj * docs(adr): add ADR-090 ultra-low-bit QAT & pi-quantization DDD architecture Comprehensive architecture decision record for implementing 2-bit/3-bit quantization-aware training in ruvLLM using Domain-Driven Design: - 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability - Pi-constant quantization with irrational scaling (pi/k step sizes) - QAT training loop with STE variants and LoRA-QAT lightweight path - QuIP incoherence via fast Walsh-Hadamard (O(n log n)) - Memory-aware MoE routing with expert precision allocation - WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern - Security: weight integrity, GGUF validation, WASM sandbox - Benchmarking: criterion suite with throughput/quality targets - 14-week timeline, maps to 18 existing files for extension Placed in docs/adr/ddd/ per DDD architectural pattern organization. https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj --------- Co-authored-by: Claude <noreply@anthropic.com>
Add comprehensive design doc for INT8 quantization implementation in ruvector-cnn, including calibration strategies and SIMD optimization. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit aee77ba Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Formalizes INT8 quantization for ruvector-cnn with DDD bounded contexts: - Quantization Core: params, tensors, scale computation - Calibration: statistics, histograms, MinMax/Percentile methods - Inference: QuantizedConv2d, fused BatchNorm, INT8 ReLU - SIMD Kernels: AVX2, NEON, WASM INT8 implementations - Observability: benchmarks, accuracy validation Targets 2-4x speedup over FP32 with <1% accuracy loss. Related to ADR-090 (ultra-low-bit QAT for LLMs). Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit b1ab65d Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Built from commit e683eb4 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
…, add implementation checklists ADR-090 (Ultra-Low-Bit QAT): - Changed status to "Accepted (Staged Implementation)" - Added decision statement choosing LoRA-QAT as first path - Added staged implementation phases (4 phases, explicit gates) - Added validation plan defining "better" (MSE, spectral, cosine, outlier retention) - Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context) - Added system invariants (INV-1 through INV-8) - Added acceptance gates (G1-G6) with rollback triggers - Restructured success criteria into correctness/performance/quality/rollout ADR-091 (INT8 CNN Quantization): - Changed status to "Accepted" - Added decision statement with acceptance benchmark - Added system invariants (INV-1 through INV-8) - Added operator coverage table (11 operators) - Added graph rewrite passes section (4 passes) - Added deployment policy matrix - Added acceptance gates (7 gates) with rollback conditions ADR-092 (MoE Memory-Aware Routing): - Split from ADR-090 as routing affects scheduling/cache, not representation - Added decision statement with acceptance benchmark (≥70% cache hit rate) - Added system invariants (INV-1 through INV-6) - Added acceptance gates (G1-G5) with rollback conditions - Added domain analysis with bounded context Implementation Checklists: - ADR-090: 6 phases, ~28 files, 16 new + 12 extended - ADR-091: 6 phases, acceptance gate verification commands Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 6c1a674 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Phase 1-4 implementation of ADR-090 with 114 tests passing. ## Core Quantization (src/quantize/) - pi_quant.rs: PiQuantizer with π/k step sizes, Pi3BitBlock, Pi2BitBlock - pi_quant_simd.rs: NEON/AVX2/scalar dequantization kernels (2.1x speedup) - hadamard.rs: Fast Walsh-Hadamard O(n log n), INV-4 orthogonality verified - incoherence.rs: IncoherenceTransform for QuIP-style decorrelation - quip.rs: Q2_QuIP variant combining incoherence + 2-bit K-quant - security.rs: WeightIntegrity, GGUF validation, bounds checking ## QAT Infrastructure (src/qat/) - config.rs: QatConfig, SteVariant, QuantGranularity with builder pattern - ste.rs: Straight-through estimator (Standard, Clipped, LSQ, EWGS) - differentiable_quant.rs: DifferentiableQuantizer trait, PiQuantDifferentiable - calibration.rs: CalibrationEngine with mixed-domain support - distillation.rs: Teacher-student composite loss (L_task + L_KD + L_reasoning) - reasoning_loss.rs: Chain-of-thought fidelity preservation - training_loop.rs: QatTrainer orchestrator with checkpointing - lora_qat.rs: Memory-efficient LoRA-QAT (50 MB vs 114 GB for full QAT) ## WASM Integration (ruvllm-wasm/) - pi_quant_wasm.rs: PiQuantWasm with SIMD128 kernel, JSON serialization - quant_bench_wasm.rs: QuantBenchWasm for in-browser benchmarking - Feature flags: pi-quant, qat ## Tests (114 passing) - pi_quant_tests.rs (35): Round-trip, block packing, bounds checking - hadamard_tests.rs (23): Orthogonality, invertibility, energy preservation - ste_tests.rs (24): Gradient correctness, PyTorch reference comparison - simd_equivalence_tests.rs (19): SIMD ≈ scalar within 1 ULP (INV-8) - acceptance_gates.rs (13): G1-G5 quality and security gates ## Benchmarks (benches/pi_quant_bench.rs) - Hadamard 4096: 5.3 μs (target <50 μs) ✓ - NEON dequant: 2.54 GiB/s (2.1x over scalar) - QAT backward: 7.3 Gelem/s ## Invariants Verified - INV-1: STE gradient flow - INV-2: Scale positivity (α > 0) - INV-3: Step size constraint (π/k) - INV-4: Hadamard orthogonality - INV-5: Calibration provenance - INV-8: SIMD ≈ scalar (≤1 ULP) Co-Authored-By: claude-flow <ruv@ruv.net>
- Add AVX-512 dequantization kernel (16-wide SIMD, target >12 GB/s) - Add AVX2 quantization kernel (8-wide SIMD) for forward pass - Add AVX2 2-bit quantization kernel - Optimize NEON kernel with prefetching and 8-group batching - Add inline assembly prefetch (prfm pldl1keep) - Update benchmarks with new throughput tests - All 77 tests pass (pi_quant: 35, simd_equivalence: 19, hadamard: 23) Performance optimizations target ADR-090 requirements: - Quantize throughput: >1 GB/s (was 467 MiB/s) - NEON dequant: >10 GB/s (was 2.54 GiB/s) - AVX-512 dequant: >12 GB/s (new) Co-Authored-By: claude-flow <ruv@ruv.net>
Remove neon_process_4_groups_ultra() which was superseded by the optimized 8-group batching implementation with prefetching. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 42c51ac Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Complete implementation of INT8 quantization for ruvector-cnn: Phase 1 - Core Infrastructure: - QuantizationParams, QuantizationScheme, QuantizationMode - QuantizedTensor<i8> with quantize/dequantize methods - CalibrationMethod (MinMax, Percentile, MSE, Entropy) - 34 unit tests passing Phase 2 - INT8 Kernels: - Scalar reference: conv2d, depthwise_conv2d, matmul, requantize - AVX2 SIMD: _mm256_maddubs_epi16 for 2-4x speedup - ARM NEON: vmull_s8, vpadalq_s16 for 2-3x speedup - WASM SIMD128: i8x16 operations for 1.5-2x speedup Phase 3 - Graph Rewrite Passes: - GR-1: BatchNorm fusion into Conv weights - GR-2: Zero-point correction pre-computation - GR-3: Q/DQ node insertion at FP32/INT8 boundaries - GR-4: ReLU/HardSwish fusion with LUT Phase 4 - Quantized Layers: - QuantizedConv2d with per-channel quantization - QuantizedDepthwiseConv2d for MobileNet - QuantizedLinear for FC layers - QuantizedMaxPool2d/AvgPool2d - QuantizedResidualAdd with scale alignment Phase 6 - Tests & Benchmarks: - quality_validation.rs: cosine similarity ≥0.995 - acceptance_gates.rs: 7 ADR-091 gates - kernel_equivalence.rs: SIMD vs scalar validation - int8_bench.rs: Criterion benchmarks Performance targets: - 2.5x latency improvement (MobileNetV3) - 4x memory reduction - <1% accuracy degradation Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit c39bf72 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Implements memory-aware expert routing with cache residency bonus: ## New moe/ Module (5 files, ~4,300 lines) - router.rs: MemoryAwareRouter with cache bonus (0.15 default) - INV-6 compliant (deterministic tie-breaking) - PagingRequest generation for non-resident experts - affinity.rs: EMA-based expert affinity tracking - INV-2 compliant (monotonic decay without activation) - top_k_by_affinity() for prefetch predictions - precision_allocator.rs: Hot/warm/cold precision assignment - Frequency-based percentile thresholds - GGUF format mapping (Q4_K_M, Q3_K, Q2_K) - sram_mapper.rs: Hardware memory hierarchy config - Presets: RPi5, Mobile, Desktop, WasmBrowser - Tier assignment (SRAM/DRAM/Storage) - metrics.rs: MoE routing metrics tracking - Cache hit rate, paging latency, prefetch accuracy ## Extended bitnet/expert_cache.rs - suggest_eviction_with_affinity(): Combined LRU/LFU + affinity - prefetch_by_affinity(): Affinity-based expert prefetching - hot_experts(): List currently cached experts ## Tests (131 total) - 86 MoE unit tests - 19 integration tests (GATE-1 through GATE-4 validation) - 26 ExpertCache tests ## Benchmarks (9 suites) - Routing overhead: ~22 ns (target: ≤15 μs) ✅ - Cache hit rate simulation - Affinity update, precision allocation Target: ≥70% cache hit rate vs 34% baseline Co-Authored-By: claude-flow <ruv@ruv.net>
HIGH severity security fixes: - router: Change new() from panic to Result<Self, &'static str> - router: Change with_default_affinity() to return Result - precision_allocator: Change new() to return Result, add new_unchecked() - sram_mapper: Change assign_tier() from assert! to returning bool MEDIUM severity security fixes: - router: Add NaN/Inf validation in apply_cache_bonus_inplace() - router: Handle NaN in select_top_k(), treat as NEG_INFINITY - affinity: Add NaN handling in top_k_by_affinity() with deterministic tie-breaking - affinity: Add NaN handling in least_affinity() for eviction decisions - sram_mapper: Fix division by zero in priority_score() when last_access=0 P0 performance optimizations: - router: Add apply_cache_bonus_inplace() to avoid allocation in hot path - router: Use select_nth_unstable_by for partial sort when k << n (O(n) vs O(n log n)) All 103 tests pass (84 unit + 19 integration). Co-Authored-By: claude-flow <ruv@ruv.net>
SIMD decay optimization (affinity.rs): - Add decay_scores_simd() with platform-specific implementations - NEON intrinsics for ARM64 (4-wide vectorization) - AVX2 intrinsics for x86_64 (8-wide vectorization) - Scalar fallback for other platforms - Handles non-aligned sizes with remainder loop Bitmask cache residency (router.rs): - Replace Vec<bool> with CacheMask bitmask structure - u64 for ≤64 experts (single word, cache-friendly) - Vec<u64> bitvector for >64 experts (larger models) - Efficient popcount for resident_list() - O(1) is_set/set operations via bitwise ops Edge case tests added: - Non-aligned SIMD sizes (1, 3, 5, 7, 9, 15, 17, 33, 65 experts) - Large expert counts (256 experts) - SIMD vs scalar correctness verification - CacheMask with >64 experts (128 experts) - Out-of-bounds handling - Empty cache state All 92 unit tests + 19 integration tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>
P2: Buffer reuse optimizations - Add reusable score_buffer and index_buffer to avoid hot-path allocations - Add route_into_buffer() using pre-allocated buffers - Add apply_cache_bonus_inplace_buffer() for in-place operations - Add select_top_k_buffered() using pre-allocated index buffer - Add route_batch() for efficient batch token routing - Add bulk metric recording methods (record_cache_hits/record_cache_misses) P3: Branch hints for hot paths - Add #[inline] attributes to all hot path methods - route(), route_into_buffer(), apply_cache_bonus_inplace_buffer() - select_top_k_buffered(), select_top_2_unrolled(), is_set(), set() P4: Loop unrolling for small arrays - Add select_top_2_unrolled() for common top-2 MoE configuration - Single pass through scores to find best and second-best - Avoids sorting overhead for the most common case Performance impact: - P2: Eliminates Vec allocations in hot routing path - P3: Reduces function call overhead via inlining - P4: 2x faster top-2 selection vs full sort All 93 MoE tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>
Add comprehensive benchmarks for memory-aware router optimizations: - bench_memory_aware_router: Tests MemoryAwareRouter performance - route_top2: P4 unrolled top-2 selection benchmark - route_batch_8: P2 batch routing with buffer reuse - cache_mask_check_64/128: P1 bitmask lookup performance - select_top2_vs_sort: Compare unrolled vs sorted selection - select_top4_partial_sort: Partial sort for larger K - bench_simd_affinity_decay: Tests SIMD decay performance - decay_all: P1 SIMD-optimized decay across expert counts - update_with_activation: Combined decay + boost performance Validates ADR-092 targets: - Routing overhead <= 15 us - Cache hit rate >= 70% Co-Authored-By: claude-flow <ruv@ruv.net>
- Bump workspace version from 2.0.5 to 2.0.6 - Update README with ADR-090 (Pi-Quantization) features - Update README with ADR-091 (INT8 CNN Quantization) features - Update README with ADR-092 (MoE Memory-Aware Routing) features - Published ruvllm v2.0.6 and ruvector-cnn v2.0.6 to crates.io Co-Authored-By: claude-flow <ruv@ruv.net>
Co-Authored-By: claude-flow <ruv@ruv.net>
The _mm512_roundscale_ps intrinsic requires a compile-time constant for the rounding mode parameter. Changed from runtime let binding to const to fix CI compilation on AVX-512 systems. Co-Authored-By: claude-flow <ruv@ruv.net>
…re-routing feat(adr-090-092): Pi-Quantization, INT8 CNN, MoE Memory-Aware Routing
Built from commit 5a4edc1 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Built from commit 5a4edc1 Platforms: linux-x64-gnu, linux-arm64-gnu, darwin-x64, darwin-arm64, win32-x64-msvc Co-Authored-By: claude-flow <ruv@ruv.net>
…rics P0: Router buffer reuse optimization - Add pre-allocated result_buffer to MemoryAwareRouter - Eliminate collect() allocation in select_top_k_buffered() - Use std::mem::take for zero-copy buffer handoff - Expected savings: 1-2µs per routing call P1: Optional routing metrics feature flag - Add 'routing-metrics' feature (enabled by default) - Conditionally compile Instant::now() and metrics tracking - Allows production builds to avoid syscall overhead (~0.04-0.08µs) Performance Analysis Documentation: - MoE routing optimization analysis report - Comprehensive architecture review (5 documents) - Identifies 8 additional optimization opportunities ADR-092 targets: <10µs routing latency, 70%+ cache hit rate All 26 MoE router tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>
perf(ruvllm): Deep optimization for MoE routing and benchmark analysis
Built from commit fd3048c Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
) * feat(ruvix): implement ADR-087 RuVix Cognition Kernel Phase A Implements the complete Phase A (Linux-hosted) RuVix Cognition Kernel with 9 crates, 760 tests, and comprehensive documentation. ## Core Crates (9) - ruvix-types: 6 kernel primitives (Task, Capability, Region, Queue, Timer, Proof) - ruvix-cap: seL4-inspired capability management with derivation trees - ruvix-region: Memory regions (Immutable, AppendOnly, Slab policies) - ruvix-queue: io_uring-style lock-free IPC with zero-copy semantics - ruvix-proof: 3-tier proof engine (Reflex <100ns, Standard <100us, Deep <10ms) - ruvix-sched: Coherence-aware scheduler with priority computation - ruvix-boot: 5-stage RVF boot loader with ML-DSA-65 signatures - ruvix-vecgraph: Kernel-resident vector/graph stores with HNSW - ruvix-nucleus: Unified kernel entry point with 12 syscalls ## Security (SEC-001, SEC-002) - Boot signature failure: PANIC immediately, no fallback path - Proof cache: 100ms TTL, single-use nonces, max 64 entries - Capability delegation depth: max 8 levels with audit warnings ## Architecture - no_std compatible for Phase B bare metal port - Proof-gated mutation: every state change requires cryptographic proof - Capability-based access control: no syscall without valid capability - Zero-copy IPC via region descriptors (TOCTOU protected) ## Documentation - Main README with architecture diagrams - Individual crate READMEs with usage examples - Architecture decision records Co-Authored-By: claude-flow <ruv@ruv.net> * docs: update ADR-087 status and add RuVix to root README - Update ADR-087 status from Proposed to Accepted (Phase A Implemented) - Add implementation status table with all 9 crates and 760 tests - Document security invariants implemented (SEC-001 through SEC-004) - Add collapsed RuVix section to root README with architecture diagram Co-Authored-By: claude-flow <ruv@ruv.net> * chore: update ruvector-coherence dependency to 2.0.4 for crates.io publish Co-Authored-By: claude-flow <ruv@ruv.net> * feat(ruvix): implement ADR-087 Phase B bare metal AArch64 support Phase B adds bare metal AArch64 support for the RuVix Cognition Kernel: New crates: - ruvix-hal: Hardware Abstraction Layer traits (~500 lines) - Console, InterruptController, Timer, Mmu, PowerManagement traits - Platform-agnostic design for ARM64/RISC-V/x86_64 - 15 unit tests passing - ruvix-aarch64: AArch64 boot and MMU support (~2,000 lines) - _start assembly entry, exception vectors - 4-level page tables with capability metadata - System register accessors (SCTLR_EL1, TCR_EL1, TTBR0/1) - Implements ruvix_hal::Mmu trait - ruvix-drivers: Device drivers for QEMU virt (~1,500 lines) - PL011 UART driver (115200 8N1, FIFO, interrupts) - GIC-400 interrupt controller (256 IRQs, 16 priorities) - ARM Generic Timer (deadline scheduling) - Volatile MMIO with memory barriers (DMB, DSB, ISB) Build infrastructure: - aarch64-boot/ with linker script and custom Rust target - QEMU virt runner integration (Cortex-A72, 128MB RAM) - Makefile with build/run/debug targets ADR-087 updated with: - Phase B objectives and new crate specifications - QEMU virt memory map (128MB RAM at 0x40000000) - 5-stage boot sequence documentation - Security enhancements and testing strategy - Raspberry Pi 4/5 platform differences Co-Authored-By: claude-flow <ruv@ruv.net> * feat(ruvix): implement Phases C/D/E and QEMU swarm simulation This adds full bare metal OS capabilities to the RuVix Cognition Kernel: ## Phase C: Multi-Core & DMA Support - ruvix-smp: Symmetric multi-processing (256 cores, spinlocks, IPIs) - ruvix-dma: DMA controller with scatter-gather - ruvix-dtb: Device tree blob parser - ruvix-physmem: Buddy allocator for physical memory ## Phase D: Raspberry Pi 4/5 Support - ruvix-bcm2711: BCM2711/2712 SoC drivers (GPIO, mailbox, UART) - ruvix-rpi-boot: RPi boot support (spin table, early UART) ## Phase E: Networking & Filesystem - ruvix-net: Full network stack (Ethernet/ARP/IPv4/UDP/ICMP) - ruvix-fs: Filesystem layer (VFS, FAT32, RamFS) ## QEMU Swarm Simulation - qemu-swarm: Multi-QEMU cluster for distributed testing - Network topologies: mesh, ring, star, tree - Fault injection and chaos testing scenarios ## Summary - 10 new crates, ~27,000 lines of code - 400+ new tests passing - ADR-087 updated with Phases C/D/E documentation - Main README updated with all phases Co-Authored-By: claude-flow <ruv@ruv.net> * fix(ruvix): address critical security vulnerabilities CVE-001 through CVE-005 Security fixes applied from deep review audit: - CVE-001 (CRITICAL): Add compile-time protection preventing `disable-boot-verify` feature in release builds. This closes a boot signature bypass vulnerability. - CVE-002 (HIGH): Add MMIO address validation to GIC driver. `Gic::new()` now returns `Result<Self, GicError>` and validates addresses against known platform ranges. Added `new_unchecked()` for trusted callers. - CVE-003 (HIGH): Add integer overflow protection in DTB parser. All offset calculations now use `checked_add()` to prevent buffer overflow via crafted DTB files. - CVE-005 (HIGH): Add IPv4 header validation ensuring `total_length >= header_len` per RFC 791. Also includes test fixes: - Mark hardware-dependent tests as `#[ignore]` (MMIO, ARM timer) - Fix swap32 test assertion in rpi-boot - Update doctests for new GIC API All 259 tests pass across affected crates. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(ruvix): implement CLI, kernel shell, and PBFT consensus Implements Phase F features for the RuVix Cognition Kernel: CLI (ruvix-cli): - build: Cross-compile kernel for AArch64 targets - config: Manage kernel configuration files - dtb: Device tree blob operations (validate, dump, compile, compare, search) - flash: UART/serial flash operations with progress reporting - keys: Ed25519 key management with secure storage - monitor: Real-time kernel metrics dashboard - security: Security audit and vulnerability scanning Kernel Shell (ruvix-shell): - Interactive command parser with history support - Commands: help, info, mem, tasks, caps, vectors, witness, proofs, queues, perf, cpu, trace, reboot - Configurable prompt with trace mode indication - Shell backend integration with nucleus kernel PBFT Consensus (qemu-swarm): - Full PBFT implementation (pre-prepare, prepare, commit phases) - View change protocol for leader recovery - Checkpoint mechanism for state synchronization - Custom serde wrappers for fixed-size byte arrays (Signature, HashDigest) - Byzantine fault tolerance (f < n/3) Additional: - Example RVF swarm consensus demo - Nucleus shell backend for kernel introspection - Fixed chrono DateTime type annotation in keys.rs Co-Authored-By: claude-flow <ruv@ruv.net> * chore(ruvix): add version specs for crates.io publishing - Add version = "0.1.0" to ruvix-dtb dependency in CLI - Add README.md for ruvix-shell crate Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
4-phase plan for retraining RuvLTRA models on GCloud: - Phase 1: TurboQuant-calibrated GGUF quantization (imatrix recalibration) - Phase 2: WET-augmented SFT + DPO fine-tuning on brain knowledge + Common Crawl - Phase 3: Benchmarking suite (HumanEval, SWE-Bench, TurboQuant quality, latency) - Phase 4: Publishing updated models to HuggingFace with -tq variants Uses existing phi4-finetuning-gpu Cloud Run template, Vertex AI for training, and brain-wet-daily pipeline for data. Estimated cost: ~$70. Co-Authored-By: claude-flow <ruv@ruv.net>
Correct TurboQuant scope (runtime KV-cache only, not weight quant), add Current Gaps section, document existing training infrastructure (13 components), clarify LoRA-based fine-tuning approach, reference related ADRs (049, 090, 093). Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 968fe21 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Built from commit ed93997 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
…blation Addresses review feedback: - Add dataset governance: record schema, source allowlist, dedup rules, eval contamination checks, quality scoring - Add release gate: 7 ship/no-ship criteria (G1-G7) with automated release_gate.py checker - Add ablation matrix: 5 runs (A-E) isolating imatrix, SFT, DPO, TQ - Add rollback plan: HF git revert, registry rollback, npm patch - Add TurboQuant serving plan: .turboquant.json sidecar config, runtime discovery, per-layer profiling - Relabel cost estimate as "initial experimental compute only" - Update status to "proposed, pending governance hardening" - Expand next steps to 21 items across 4 phases Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit e265141 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Training tooling: - release_gate.py: Automated 7-gate ship/no-ship checker (G1-G7) - export_training_data.py: Dataset export with governance (schema, dedup, quality scoring, contamination check) - contamination_check.py: 13-gram eval contamination detection - run_calibration.py: Phase 1 imatrix + TurboQuant profiling - run_sft.py: Phase 2 LoRA SFT + DPO training - deploy_training.sh: Cloud Run job creation + Vertex AI setup - Dockerfile: GPU training image (transformers + peft + trl) Rust infrastructure: - turboquant_profile.rs: .turboquant.json sidecar config loading, per-layer TQ config discovery, default profiles Ref: ADR-129, ruvnet#310 Co-Authored-By: claude-flow <ruv@ruv.net>
- nightly_train.sh: 5-phase nightly pipeline (export brain learnings, contamination check, incremental LoRA, release gates, push to HF) - Updated deploy_training.sh with nightly Cloud Run job + scheduler - Updated ADR-129 with nightly continuous learning section Schedule: daily 03:00 UTC, ~$4/day, skips if <10 new records. All 7 release gates must pass before publishing. Ref: ruvnet#310 Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 063c838 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Built from commit 8289823 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
98 brain memories + 131 ADRs + 1 routing reference. Governance: SHA-256 dedup, quality >= 0.5, schema validated. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit e6d4f50 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
GPU-enabled Cloud Run jobs have a maximum timeout of 1 hour. The previous 7200s (2hr) setting was rejected by the API. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit b866cd5 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
- Rewrite run_calibration.py to use gguf Python package + llama-cpp-python prebuilt wheels instead of compiling llama.cpp from source - Simplify Dockerfile: single-stage, pip install only, no CUDA compilation (build time: ~5min vs 20+min) - Update ADR-129 with tooling decision section explaining ruvllm-native choice - Remove llama-imatrix and llama-quantize binary dependencies Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 3dc7753 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
The pip install of llama-cpp-python from source requires ninja + cmake for CUDA compilation. Use the prebuilt wheel from the cu124 index instead. Falls back to source install, then transformers-only mode. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit f220d3b Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
- Add libgomp1 (required by llama-cpp-python OpenMP) - Use PyTorch cu124 index for proper CUDA wheel - Set default CMD with --model-id for Cloud Run execution - Consolidate pip installs for Docker layer cache efficiency Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 63c68bc Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 6f4b3d4 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Phase 1 calibration deployed and executed on GCloud L4 GPU. Infrastructure: Docker image built (torch 2.5.1+cu124), 3 Cloud Run jobs deployed, 2 schedulers enabled. Training corpus exported. Release gate automation tested. TurboQuant sidecars on HuggingFace. Co-Authored-By: claude-flow <ruv@ruv.net>
… models Status: Accepted. ruvltra-small complete, 3 remaining models executing on L4 GPU (ruvltra-medium, ruvltra-claude-code, ruvltra). Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit bab9f45 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Built from commit e7ad2af Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
Calibration results (L4 GPU): - ruvltra-small: 75.4 tok/s - ruvltra-medium: 62.6 tok/s - ruvltra-claude-code: 67.1 tok/s - ruvltra: pending final execution TQ profiles + benchmark_results.json uploaded to all HuggingFace models. Co-Authored-By: claude-flow <ruv@ruv.net>
Built from commit 04ed5b8 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
…mport() Issue: ruvnet#315 - import() populates this.memories but never updates HNSW index. After engine.import(data), recall() always returns [] because HNSW search returns empty results and brute-force fallback never triggers (empty array, not error). Changes: - Add importAsync() that calls import() + rebuildHnswIndex() - Add rebuildHnswIndex() to re-insert memories into HNSW - Fix recall() to fall through to brute-force when HNSW returns empty results - Document that import() is sync and doesn't rebuild HNSW Fixes: ruvnet#315
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #315
After calling
engine.import(data),engine.recall()always returns[]even though memories are present inthis.memories. The HNSW vector index is not rebuilt during import, and the brute-force fallback never triggers because empty results don't throw errors.Changes
importAsync(data, merge)- Async version that callsimport()+rebuildHnswIndex()rebuildHnswIndex()- Re-inserts all memories into HNSW indexrecall()- Fall through to brute-force when HNSW returns empty results (not just on errors)Root Cause
import()populatesthis.memoriesbut never callsvectorDb.insert(). After import,recall()callsvectorDb.search(), receives empty array, and returns it immediately. The catch block only triggers on thrown errors, not on empty results.Usage
Test Plan
importAsync()and verifyrecall()returns resultsimport()+rebuildHnswIndex()and verifyrecall()returns resultsFixes: #315