Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,15 @@ Wire it into Claude Code (`~/.claude.json`):

`--read-only` opens the DB with a shared lock and hides the `execute` tool. Full docs + the other six tools' references in [`docs/mcp.md`](docs/mcp.md).

### End-to-end example apps

Beyond the per-language quickstarts in [`examples/`](examples/), the SQLR-38 umbrella tracks longer, opinionated example apps that exercise SQLRite in real-world shapes:

| App | SDK | What it shows |
|---|---|---|
| [Python LLM agent with persistent memory](examples/python-agent/) | Python | Vector + lexical recall, fact extraction, summaries — all in one `.sqlrite` file |
| [Chat-with-your-notes via Claude Desktop MCP](examples/nodejs-notes/) | Node.js | Markdown → hybrid HNSW + BM25 index → `sqlrite-mcp --read-only` → Claude Desktop |

### Developer guide

In-depth documentation lives under [`docs/`](docs/). Start at [`docs/_index.md`](docs/_index.md) — it navigates to:
Expand Down
11 changes: 11 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Beyond the per-SDK quick-start tours above, the [SQLR-38 umbrella](../docs/roadm
| App | Language / SDK | What it shows | Directory |
|---|---|---|---|
| LLM agent with persistent memory | Python | Vector + lexical recall, fact extraction, summaries — all in one `.sqlrite` file | [`python-agent/`](python-agent/) |
| Chat with your notes (MCP) | Node.js | Markdown → SQLRite hybrid retrieval, served to Claude Desktop via `sqlrite-mcp --read-only` | [`nodejs-notes/`](nodejs-notes/) |

## Running the Rust quickstart

Expand Down Expand Up @@ -82,6 +83,16 @@ python -m sqlrite_agent # works offline; no API key required

A full CLI chat agent whose long-term memory is one `.sqlrite` file. Embeds each turn, hybrid-searches over past messages and a structured `facts` table on every recall, and survives process restarts. Read [`python-agent/README.md`](python-agent/README.md) for the demo script and architecture diagram.

## Running the Node.js notes assistant (SQLR-40)

```bash
cd examples/nodejs-notes
npm install
node bin/sqlrite-notes.mjs init ~/Documents/notes
```

Ingests a folder of markdown notes into a `notes.sqlrite` file with HNSW + BM25 indexes, then `sqlrite-notes serve` wraps `sqlrite-mcp --read-only` so **Claude Desktop / any MCP client** can `bm25_search` / `vector_search` / `query` / `ask` your local notes directly — no cloud sync, no third-party indexer. Default embedder is fully offline (deterministic hash bag-of-words); flip to `--embedder openai` with `OPENAI_API_KEY` set for real semantic recall. Read [`nodejs-notes/README.md`](nodejs-notes/README.md) for the Claude Desktop config snippet and the hybrid-retrieval SQL walkthrough.

## Running the Node.js sample

```bash
Expand Down
6 changes: 6 additions & 0 deletions examples/nodejs-notes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
node_modules/
*.sqlrite
*.sqlrite-journal
.env
.env.local
test-fixtures-tmp/
344 changes: 344 additions & 0 deletions examples/nodejs-notes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
# sqlrite-notes — chat with your markdown notes via Claude Desktop

A Node.js CLI that ingests a folder of markdown notes (Obsidian
vault, Notion export, plain `~/Documents/notes`) into a SQLRite
database, then exposes the database to **Claude Desktop / any MCP
client** through the engine's first-party MCP server.

End-user effect: drop your notes folder in, paste one block into
Claude Desktop's config, and ask Claude *"what did I write about
CRDTs last month?"* — it answers using your local notes. No cloud
sync, no third-party indexer, the entire memory is one `.sqlrite`
file on disk you can open in the REPL.

> **Why this example?** Other "chat with your notes" demos build a
> custom RAG pipeline and bolt it onto a model. This one shows that
> when the database itself speaks the agent protocol, you don't need
> a pipeline — *Claude drives the database directly* via
> `sqlrite-mcp`. The Node.js side is just the ingest + glue.

## Architecture

```mermaid
flowchart LR
Notes[/"~/notes/*.md<br/>(markdown)"/] -->|sqlrite-notes init / refresh| Ingest
Ingest["Ingest pipeline<br/>(walk → chunk → embed → store)"] --> DB[("notes.sqlrite<br/>documents · chunks<br/>HNSW + FTS indexes")]
DB -->|"sqlrite-mcp --read-only<br/>stdio JSON-RPC"| Claude["Claude Desktop<br/>(or any MCP client)"]
Claude -->|"vector_search · bm25_search · query · ask"| DB
```

The whole stack: Node.js for the **write side** (ingest pipeline,
chunking, embeddings), SQLRite for **storage + retrieval primitives**
(HNSW vector index, BM25 inverted index, raw SQL), and `sqlrite-mcp`
for the **read side** that Claude actually talks to. The Node CLI
never touches the database while Claude is connected — that's what
`--read-only` is for.

## Schema (v1)

| Table | Purpose | Indexes |
|-------------|------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| `documents` | One row per `.md` file — path, title, mtime, full body, content hash. | UNIQUE on `path`. FTS on `content` (BM25 over whole docs). |
| `chunks` | One row per ~400-token slice of a document, plus a `VECTOR(384)` embedding. | HNSW on `embedding` (semantic KNN). FTS on `content` (passage BM25). |

Hybrid retrieval queries `chunks` and fuses BM25 + vector cosine in a
single `ORDER BY` (see [`docs/fts.md`](../../docs/fts.md) for the
SQL pattern; the executor's `try_fts_probe` hook serves the top-k
straight from the inverted index).

## Install

The example lives inside the SQLRite monorepo for now (the umbrella
ticket SQLR-38 will lift it into its own repo once we've shipped a
few more).

```bash
git clone https://github.com/joaoh82/rust_sqlite
cd rust_sqlite/examples/nodejs-notes
npm install
```

`npm install` pulls **`@joaoh82/sqlrite`** (pinned to `^0.10.0`) with
prebuilt napi-rs binaries for macOS-arm64, Linux x64/arm64, and
Windows x64 — no Rust toolchain required for the Node side.

`sqlrite-mcp` is a separate Rust binary. Install it once, anywhere
on your `PATH`:

```bash
# from crates.io (~30s):
cargo install sqlrite-mcp

# or grab a prebuilt binary from GitHub Releases:
# https://github.com/joaoh82/rust_sqlite/releases
```

If you don't want to install globally, set `SQLRITE_MCP_BIN` to its
absolute path — `sqlrite-notes serve` will pick it up.

## Run

```bash
# 1. Ingest a folder of markdown into a notes.sqlrite database.
node bin/sqlrite-notes.mjs init ~/Documents/notes

# 2. Confirm it works locally — same retrieval shape Claude will see.
node bin/sqlrite-notes.mjs search "what did I learn about CRDTs?"

# 3. Wire up Claude Desktop using the snippet printed by `init`
# (also available any time via `sqlrite-notes config`).

# 4. Open Claude Desktop. The sqlrite-mcp tools appear in the
# tool picker — `bm25_search`, `vector_search`, `query`, `ask`,
# plus `list_tables` / `describe_table` / `schema_dump`.
```

Once you've added the snippet to `claude_desktop_config.json` and
restarted Claude Desktop, run a chat like:

> *"Summarize what I've written about Postgres over the last month."*

Claude will call `bm25_search` (and/or `vector_search`) against the
`chunks` table, get back the matching passages, and answer with
inline citations to the file path and chunk number.

## Zero-config: works fully offline

The default embedder is a **deterministic hash-based bag-of-words**
embedder that runs in pure JavaScript. No API key, no network,
nothing to install — `sqlrite-notes init ~/Documents/notes` works
on a fresh laptop.

Hybrid retrieval still beats either signal alone because BM25 is
already doing exact-term ranking; the hash embedder mostly carries
its weight via the long-tail of co-occurring tokens.

For real semantic recall, switch to OpenAI:

```bash
export OPENAI_API_KEY=sk-...
node bin/sqlrite-notes.mjs init ~/Documents/notes --embedder openai
```

Uses `text-embedding-3-small` with the `dimensions: 384` override
so it matches the schema. Override the model with
`--openai-model text-embedding-3-large` (and bump `--dim` if you
want full-fat dimensionality).

## CLI surface

```
sqlrite-notes init <dir> Ingest <dir> into the notes DB (replaces the index).
sqlrite-notes refresh <dir> Re-ingest only files whose mtime/hash changed.
sqlrite-notes search "<query>" Hybrid retrieval against the index (debug).
sqlrite-notes serve Spawn sqlrite-mcp --read-only against the DB.
sqlrite-notes stats Row counts.
sqlrite-notes config Print the Claude Desktop config snippet again.
```

Common flags:

```
--db <path> Path to the SQLRite database file. Default: ~/.sqlrite-notes/notes.sqlrite
--embedder hash|openai Embedding provider. Default: hash (offline).
--dim <N> Vector dimension. Default: 384.
--openai-model <name> OpenAI embedding model. Default: text-embedding-3-small.
--chunk-tokens <N> Target chunk size in tokens. Default: 400.
--chunk-overlap <N> Chunk overlap in tokens. Default: 60.
```

## Claude Desktop config

`sqlrite-notes init` prints this block; you can also regenerate it
any time with `sqlrite-notes config --db <path>`:

```json
{
"mcpServers": {
"sqlrite-notes": {
"command": "sqlrite-notes",
"args": ["serve", "--db", "/Users/you/.sqlrite-notes/notes.sqlrite"]
}
}
}
```

Where it lives:

| Platform | Path |
|---|---|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Linux | `${XDG_CONFIG_HOME:-~/.config}/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |

Merge the snippet into any existing `mcpServers` block; don't
overwrite the file wholesale.

### Try it without Claude Desktop

If you don't have Claude Desktop, the same MCP server works with
**any** MCP client — Cursor, Codex, your own. The fastest way to
verify the wiring is Anthropic's inspector:

```bash
npx @modelcontextprotocol/inspector sqlrite-notes serve --db ./notes.sqlrite
```

Open the URL it prints, click through the tools, type JSON args.
Saves a lot of "did Claude restart correctly?" debugging.

## Open the DB yourself

The notes index is plain SQLRite — open it in the REPL whenever:

```bash
$ cargo install sqlrite-engine # or grab a release binary
$ sqlrite ./notes.sqlrite
SQLRite v0.10.0
sqlrite> SELECT path, title FROM documents ORDER BY mtime DESC LIMIT 5;
sqlrite> SELECT path, ord, substr(content, 1, 80)
...> FROM chunks
...> JOIN documents ON chunks.document_id = documents.id
...> WHERE fts_match(chunks.content, 'rust embedded')
...> ORDER BY bm25_score(chunks.content, 'rust embedded') DESC
...> LIMIT 5;
```

This is the demo's whole point: **the notes index is just SQL**. You
can query it, back it up, copy it between machines, or feed the same
file into the Python / Go / WASM SDKs without converting anything.

## How hybrid retrieval works

The `search` command (and Claude, indirectly, when it composes
`query` calls) runs the canonical hybrid shape from
[`docs/fts.md`](../../docs/fts.md):

```sql
SELECT id, document_id, ord, content FROM chunks
WHERE fts_match(content, 'collaborative editing CRDT')
ORDER BY 0.5 * bm25_score(content, 'collaborative editing CRDT')
+ 0.5 * (1.0 - vec_distance_cosine(embedding, [/* query embedding */]))
DESC LIMIT 5;
```

Two things worth noting:

1. `vec_distance_cosine` returns a *distance* (`1 - cos(a, b)`).
Hybrid scoring wants "higher is better", so we invert it.
2. `fts_match` pre-filters before scoring. Paraphrases with zero
shared tokens never get scored — a deliberate tradeoff. If a
query produces no FTS tokens at all (e.g. a single non-ASCII
word), `db.hybridSearch()` falls back to vector-only so Claude
always has *something* to ground on.

The `-w 0..1` flag on `sqlrite-notes search` tunes the BM25 / vector
balance. At `-w 1` you get pure BM25; at `-w 0` pure vector cosine.
Default is 0.5.

## Known limitations

- **Concurrent `refresh` while `serve` is running.** SQLRite shipped
`BEGIN CONCURRENT` writes in v0.10 (the SQLR-22 / Phase 11 work),
so reading from `sqlrite-mcp --read-only` while `sqlrite-notes
refresh` writes is supported in principle. *In practice we
recommend stopping Claude Desktop's MCP connection during a
refresh* — Claude Desktop caches the tool schemas at server-spawn
time and won't notice newly-ingested files until you reload it
anyway, so the gain from coexisting isn't worth the surprise.

- **HNSW after delete + re-insert (engine bug, SQLR-8).** The
engine's HNSW chunk index panics when rows are deleted and re-
inserted within the same connection lifetime — `index out of
bounds` inside `DistanceMetric::compute`. The ingest pipeline
works around it by splitting refresh into a delete-phase →
close/reopen → insert-phase, which forces a clean index rebuild
on next open. We pay the close/reopen hop only when there are
actual deletions (so first-time `init` skips it). Once SQLR-8
lands in the engine, `src/ingest.mjs` can drop the `db.reopen()`
call.

- **Aggregates limited.** The engine supports `COUNT(*)`,
`SUM`/`AVG`/`MIN`/`MAX` but not arbitrary expressions in `SELECT`
projection beyond aggregates (see [`docs/supported-sql.md`](../../docs/supported-sql.md)).
The `stats` command sticks to `COUNT(*)`.

- **No parameter binding yet.** Engine SDKs don't yet accept `?`
placeholders ([Phase 5a.2](../../docs/roadmap.md) follow-up); all
SQL strings inline literals via the `q()` helper in
[`src/sqlutil.mjs`](src/sqlutil.mjs). That helper handles
quoting + vector-literal encoding for the four value types we use
(string / int / vector / null).

- **Re-ingest persistence.** `refresh` requires you to pass the
source directory again (we don't yet store it inside the DB). If
you forget the path, run `sqlrite-notes stats --db <path>` to
confirm the index exists, then re-run `init` to rebuild from
scratch — it's idempotent.

- **`--watch` mode** — not in v1. Re-run `refresh` manually after
adding notes.

- **Authentication.** None. This is a single-user local tool; the
MCP server inherits the spawner's filesystem privileges. Don't
point it at a database you wouldn't share with whoever can read
your home directory.

## Development

```bash
npm install
npm test # offline; runs all 40 unit + integration tests
```

The test suite uses `node:test` and exercises:

- `chunker` (frontmatter, title derivation, overlap windowing)
- `sqlutil` (every value type the engine accepts as a SQL literal)
- `embeddings` (hash determinism + a mocked OpenAI HTTP call)
- `db` (schema migration, upsert/delete/cascade, hybrid retrieval)
- `ingest` (end-to-end against the markdown fixtures in
[`test/fixtures/`](test/fixtures/))
- `serve` + `claude-config` (binary lookup, snippet shape)

A few tests need the `@joaoh82/sqlrite` engine binding installed;
they skip cleanly (with a message) if it isn't.

## Layout

```
examples/nodejs-notes/
├── package.json # @joaoh82/sqlrite pinned to ^0.10.0
├── README.md # this file
├── bin/
│ └── sqlrite-notes.mjs # entry — calls src/cli.mjs
├── src/
│ ├── cli.mjs # argv parsing + command dispatch
│ ├── config.mjs # defaults + path resolution
│ ├── sqlutil.mjs # q() + ident() — SQL literal helpers
│ ├── db.mjs # schema, migrations, every SQL string
│ ├── chunker.mjs # frontmatter + heading-aware chunking
│ ├── embeddings.mjs # hash + OpenAI embedders
│ ├── ingest.mjs # plan → delete → reopen → insert
│ ├── search.mjs # hybrid retrieval driver
│ ├── serve.mjs # spawn sqlrite-mcp --read-only
│ └── claude-config.mjs # Claude Desktop snippet renderer
└── test/ # node --test suite
├── fixtures/ # 3 short markdown notes
├── chunker.test.mjs
├── claude-config.test.mjs
├── db.test.mjs # integration — needs @joaoh82/sqlrite
├── embeddings.test.mjs
├── ingest.test.mjs # integration — needs @joaoh82/sqlrite
├── serve.test.mjs
└── sqlutil.test.mjs
```

The example binds only to the documented public surfaces:
[`@joaoh82/sqlrite`](../../sdk/nodejs/) (`Database`, `Statement`)
and [`sqlrite-mcp`](../../docs/mcp.md) as a child process. It does
not reach into engine internals.

## License

MIT — same as the rest of the rust_sqlite repo.
Loading
Loading