MinishLab · Pringled · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/README.md b/README.md
@@ -149,6 +149,8 @@ We benchmark quality and speed across all methods on ~1,250 queries over 63 repo
 | CodeRankEmbed | 0.765 | 57 s | 16 ms |
 | ColGREP | 0.693 | 5.8 s | 124 ms |
 | BM25 | 0.673 | 263 ms | 0.02 ms |
+| grepai | 0.561 | 35 s | 48 ms |
+| probe | 0.387 | — | 207 ms |
 | ripgrep | 0.126 | — | 12 ms |
 
 Semble achieves 99% of the performance of the 137M-parameter [CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) Hybrid, while indexing 218x faster and answering queries 11x faster. See [benchmarks](benchmarks/README.md) for per-language results, ablations, and methodology.

diff --git a/assets/images/speed_vs_ndcg_cold.png b/assets/images/speed_vs_ndcg_cold.png
diff --git a/assets/images/speed_vs_ndcg_warm.png b/assets/images/speed_vs_ndcg_warm.png
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -7,6 +7,7 @@ Quality and speed benchmarks for `semble`.
 - [Ablations](#ablations)
 - [Dataset](#dataset)
 - [Methods](#methods)
+- [Excluded methods](#excluded-methods)
 - [Running the benchmarks](#running-the-benchmarks)
 
 ## Main results
@@ -20,6 +21,8 @@ Quality and speed across all methods.
 | CodeRankEmbed | 0.765 | 57 s | 16 ms |
 | ColGREP | 0.693 | 5.8 s | 124 ms |
 | BM25 | 0.673 | 263 ms | 0.02 ms |
+| grepai | 0.561 | 35 s | 48 ms |
+| probe | 0.387 | — | 207 ms |
 | ripgrep | 0.126 | — | 12 ms |
 
 | ![Speed vs quality (cold)](../assets/images/speed_vs_ndcg_cold.png) | ![Speed vs quality (warm)](../assets/images/speed_vs_ndcg_warm.png) |
@@ -34,28 +37,28 @@ NDCG@10 is averaged across all queries. Speed numbers use one repo per language,
 
 NDCG@10 per language, sorted by CodeRankEmbed Hybrid (CRE in the table). Best score per row is bolded.
 
-| Language | semble | CRE Hybrid | CRE | ColGREP | ripgrep |
-|---|---:|---:|---:|---:|---:|
-| scala | 0.909 | **0.922** | 0.845 | 0.765 | 0.180 |
-| cpp | **0.915** | 0.913 | 0.846 | 0.626 | 0.126 |
-| ruby | **0.909** | **0.909** | 0.769 | 0.708 | 0.230 |
-| elixir | 0.894 | **0.905** | 0.869 | 0.808 | 0.134 |
-| javascript | 0.917 | 0.903 | **0.920** | 0.823 | 0.176 |
-| zig | **0.913** | 0.901 | 0.807 | 0.474 | 0.000 |
-| csharp | 0.885 | **0.889** | 0.743 | 0.614 | 0.117 |
-| go | **0.895** | 0.884 | 0.676 | 0.785 | 0.133 |
-| python | 0.867 | **0.880** | 0.794 | 0.777 | 0.202 |
-| php | 0.858 | **0.874** | 0.758 | 0.663 | 0.123 |
-| swift | 0.860 | **0.873** | 0.721 | 0.710 | 0.160 |
-| bash | 0.825 | 0.852 | **0.892** | 0.706 | 0.000 |
-| lua | 0.823 | **0.847** | 0.803 | 0.798 | 0.000 |
-| java | **0.849** | 0.841 | 0.706 | 0.641 | 0.198 |
-| kotlin | 0.821 | **0.830** | 0.670 | 0.637 | 0.166 |
-| rust | **0.856** | 0.827 | 0.627 | 0.662 | 0.162 |
-| c | 0.741 | **0.806** | 0.706 | 0.676 | 0.000 |
-| haskell | 0.765 | 0.771 | **0.776** | 0.683 | 0.000 |
-| typescript | 0.706 | **0.708** | 0.545 | 0.430 | 0.128 |
-| **overall** | **0.854** | **0.862** | **0.765** | **0.693** | **0.126** |
+| Language | semble | CRE Hybrid | CRE | ColGREP | grepai | probe | ripgrep |
+|---|---:|---:|---:|---:|---:|---:|---:|
+| scala | 0.909 | **0.922** | 0.845 | 0.765 | 0.330 | 0.392 | 0.180 |
+| cpp | **0.915** | 0.913 | 0.846 | 0.626 | 0.731 | 0.375 | 0.126 |
+| ruby | **0.909** | **0.909** | 0.769 | 0.708 | 0.643 | 0.382 | 0.230 |
+| elixir | 0.894 | **0.905** | 0.869 | 0.808 | 0.669 | 0.412 | 0.134 |
+| javascript | 0.917 | 0.903 | **0.920** | 0.823 | 0.675 | 0.588 | 0.176 |
+| zig | **0.913** | 0.901 | 0.807 | 0.474 | 0.755 | 0.369 | 0.000 |
+| csharp | 0.885 | **0.889** | 0.743 | 0.614 | 0.277 | 0.392 | 0.117 |
+| go | **0.895** | 0.884 | 0.676 | 0.785 | 0.722 | 0.410 | 0.133 |
+| python | 0.867 | **0.880** | 0.794 | 0.777 | 0.634 | 0.488 | 0.202 |
+| php | 0.858 | **0.874** | 0.758 | 0.663 | 0.402 | 0.340 | 0.123 |
+| swift | 0.860 | **0.873** | 0.721 | 0.710 | 0.429 | 0.280 | 0.160 |
+| bash | 0.825 | 0.852 | **0.892** | 0.706 | 0.723 | 0.226 | 0.000 |
+| lua | 0.823 | **0.847** | 0.803 | 0.798 | 0.699 | 0.336 | 0.000 |
+| java | **0.849** | 0.841 | 0.706 | 0.641 | 0.386 | 0.536 | 0.198 |
+| kotlin | 0.821 | **0.830** | 0.670 | 0.637 | 0.478 | 0.335 | 0.166 |
+| rust | **0.856** | 0.827 | 0.627 | 0.662 | 0.519 | 0.242 | 0.162 |
+| c | 0.741 | **0.806** | 0.706 | 0.676 | 0.555 | 0.384 | 0.000 |
+| haskell | 0.765 | 0.771 | **0.776** | 0.683 | 0.483 | 0.313 | 0.000 |
+| typescript | 0.706 | **0.708** | 0.545 | 0.430 | 0.394 | 0.354 | 0.128 |
+| **overall** | **0.854** | **0.862** | **0.765** | **0.693** | **0.561** | **0.387** | **0.126** |
 
 ## Ablations
 
@@ -102,10 +105,19 @@ NDCG@10 per language, sorted by CodeRankEmbed Hybrid (CRE in the table). Best sc
 ## Methods
 
 - **[ripgrep](https://github.com/BurntSushi/ripgrep)**: fast regex search over files, included as a raw keyword-match baseline.
+- **[probe](https://github.com/buger/probe)**: BM25 keyword ranking backed by tree-sitter parse trees. No persistent index; scans on the fly.
 - **[ColGREP](https://github.com/lightonai/next-plaid/tree/main/colgrep)**: late-interaction code retrieval built on next-plaid with the [LateOn-Code-edge](https://huggingface.co/lightonai/LateOn-Code-edge) model.
+- **[grepai](https://github.com/nicholasgasior/grepai)**: semantic search using [nomic-embed-text](https://huggingface.co/nomic-ai/nomic-embed-text-v1) (137M params) via a local Ollama daemon.
 - **[CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed)**: 137M-param transformer embedding model for code retrieval. *CodeRankEmbed Hybrid* fuses its dense scores with BM25.
 - **[semble](https://github.com/your-repo/semble)**: this library. [potion-code-16M](https://huggingface.co/minishlab/potion-code-16M) static embeddings + BM25 + the semble reranking stack.
 
+## Excluded methods
+
+Two tools were considered but not included in the benchmark:
+
+- **[codanna](https://codanna.io)**: symbol-level semantic search with fastembed. Excluded because it does not support Haskell, Bash, Zig, Scala, Elixir, or Ruby (6 of the 19 benchmark languages, covering 20 of 63 repos (~38% of tasks)).
+- **[claude-context](https://github.com/zilliztech/claude-context)**: retrieval-augmented code search using OpenAI embeddings and a vector database. Excluded because it requires a paid OpenAI API key and a running vector-DB service.
+
 ## Running the benchmarks
 
 Repos are pinned in `repos.json` and cloned into `~/.cache/semble-bench`:
@@ -152,6 +164,42 @@ uv run python -m benchmarks.baselines.ablations --mode semble-semantic
 
 </details>
 
+<details>
+<summary>probe</summary>
+
+Needs `probe` on `$PATH` (`npm install -g @buger/probe`).
+
+```bash
+uv run python -m benchmarks.baselines.probe
+uv run python -m benchmarks.baselines.probe --repo fastapi --repo axios
+```
+
+</details>
+
+<details>
+<summary>grepai</summary>
+
+Needs `grepai` on `$PATH` and Ollama running with `nomic-embed-text` pulled:
+
+```bash
+ollama pull nomic-embed-text
+```
+
+```bash
+uv run python -m benchmarks.baselines.grepai
+uv run python -m benchmarks.baselines.grepai --repo fastapi --repo axios
+```
+
+Large repos take several minutes to index. Use `--timeout <seconds>` (default 120) for repos with many files:
+
+```bash
+uv run python -m benchmarks.baselines.grepai --timeout 1800 --output results.json
+```
+
+The `--output` flag enables resume mode: already-completed repos are skipped on restart.
+
+</details>
+
 <details>
 <summary>ripgrep</summary>