feat: multi-provider embeddings, distillation vector search, and cross-project recall by BYK · Pull Request #58 · BYK/opencode-lore

BYK · 2026-04-09T13:21:17Z

Summary

Embedding provider abstraction: Refactored embedding.ts from hardcoded Voyage AI to an EmbeddingProvider interface with Voyage and OpenAI implementations. Config gets a provider field ("voyage" | "openai"), each reading its own env var. Fully backward-compatible — existing configs default to "voyage".
Distillation vector search: Schema migration 9 adds embedding BLOB to distillations table. Distillations are embedded fire-and-forget on store (both gen-0 and meta-distillation). Brute-force cosine similarity search feeds into the recall tool's RRF alongside FTS results, improving semantic recall over session history.
Cross-project knowledge discovery: The recall tool now searches knowledge entries from other projects when scope is "all". Results are tagged with the source project name (e.g., [knowledge/Architecture from: other-project]) and naturally rank lower via RRF since they're a separate list. This surfaces relevant knowledge you've captured in project A when working in project B.

Motivation

Inspired by analysis of MemPalace's benchmark approach: raw verbatim text + embedding search scores 96.6% on LongMemEval vs ~70% for BM25/keyword search. The 26pp gap is entirely embedding quality. Extending Lore's existing embedding infrastructure to distillations (semantically rich summaries, ~10-50 per project) is the highest-value improvement at lowest cost.

Files Changed

File	Change
`src/embedding.ts`	Provider interface + Voyage/OpenAI classes, distillation vector search/embed/backfill
`src/config.ts`	`provider` field in embeddings config
`src/db.ts`	Migration 9 (distillation embedding BLOB), `projectName()` helper
`src/distillation.ts`	Embed distillations on store
`src/ltm.ts`	`searchScoredOtherProjects()`
`src/reflect.ts`	Distillation vector search + cross-project discovery in recall tool
`src/index.ts`	Distillation embedding backfill on startup
`test/db.test.ts`	Schema version 8 → 9
`test/embedding.test.ts`	`resetProvider()` for test isolation

Testing

All 294 tests pass (bun test)
TypeScript compilation clean (npx tsc --noEmit)

…s-project recall - Abstract embedding provider interface with Voyage AI and OpenAI support. Config gets a 'provider' field (default: voyage, backward-compatible). Each provider reads its own env var (VOYAGE_API_KEY, OPENAI_API_KEY). - Extend vector search to distillations: schema migration 9 adds embedding BLOB to distillations table, fire-and-forget embed on store, brute-force cosine search feeds into recall RRF alongside FTS results. - Cross-project knowledge discovery in recall tool: when scope is 'all', searches knowledge entries from other projects and surfaces them tagged with the source project name.

## Summary - **Content-aware deduplication** in the gradient transform pipeline: detects repeated tool outputs (same file read multiple times, identical command results) and replaces earlier occurrences with compact annotations, keeping only the latest. - Two dedup levels: **exact content hash** (identical outputs) and **same-file-path reads** (different content from edits between reads). - Runs as a pre-pass before layer selection (between layer 0 and layer 1), reducing token pressure so sessions can stay at lower, less lossy gradient layers. ## Motivation Inspired by Dirac's ContextManager approach: in long coding sessions, the same file is often read 2-5 times (explore → edit → verify). Each read stores the full content as tokens. A 500-line file appearing 3 times costs ~15K tokens; after dedup: ~5.1K. Those saved tokens can be the difference between layer 1 (clean window eviction, prompt caching preserved) and layer 2 (tool stripping, cache busted). ## Design - `deduplicateToolOutputs(messages, currentTurnIdx)` scans all completed tool parts, groups by content hash and file path, keeps the latest occurrence, replaces earlier ones with `dedupAnnotation()`. - **Current turn is sacred** — never touched. - **Tool parts are never removed** — only `state.output` is replaced (preserves tool_use/tool_result pairing). - **Small outputs skipped** — outputs below 600 chars aren't deduplicated (annotation would cost more than the original). - **Zero-cost no-op** — returns original array reference when no duplicates exist. - Follows existing `toolStripAnnotation()` pattern. ## Files Changed | File | Change | |------|--------| | `src/gradient.ts` | `deduplicateToolOutputs()`, `dedupAnnotation()`, `simpleHash()`, `extractFilePath()` + integration into `transformInner()` | | `test/gradient.test.ts` | 7 new tests: exact-match, same-file, current-turn protection, small-output skip, no-change passthrough, bash dedup, triple-read | ## Testing - All 301 tests run, 298 pass, 3 fail (pre-existing `vectorSearch` test isolation issue from #58, not introduced here) - `npx tsc --noEmit` clean

BYK enabled auto-merge (squash) April 9, 2026 13:21

BYK force-pushed the feat/embedding-providers-and-cross-project branch from bf195a7 to 6f616ec Compare April 9, 2026 13:23

BYK merged commit 0d74697 into main Apr 9, 2026
1 check passed

BYK deleted the feat/embedding-providers-and-cross-project branch April 9, 2026 13:24

craft-deployer bot mentioned this pull request Apr 9, 2026

publish: BYK/opencode-lore@0.8.0 #59

Closed

2 tasks

BYK mentioned this pull request Apr 9, 2026

feat: content-aware deduplication pre-pass in gradient transform #60

Merged

This was referenced Apr 9, 2026

publish: BYK/opencode-lore@0.8.0 #61

Closed

publish: BYK/opencode-lore@0.8.0 #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-provider embeddings, distillation vector search, and cross-project recall#58

feat: multi-provider embeddings, distillation vector search, and cross-project recall#58
BYK merged 1 commit intomainfrom
feat/embedding-providers-and-cross-project

BYK commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented Apr 9, 2026

Summary

Motivation

Files Changed

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant