Skip to content

Document MS MARCO 1M scale result#104

Merged
SonAIengine merged 1 commit into
mainfrom
codex/msmarco-million-progress
Jul 2, 2026
Merged

Document MS MARCO 1M scale result#104
SonAIengine merged 1 commit into
mainfrom
codex/msmarco-million-progress

Conversation

@SonAIengine

Copy link
Copy Markdown
Contributor

Summary

  • document the completed MS MARCO 1M large-tier run
  • add ingest progress output to the tier1 runner for long corpus builds
  • cover progress logging with a focused unit test

1M large-tier result

  • MS MARCO passage dev, 1,000,000 docs, 50 queries, embedder-free SQLite/EvidenceSearch
  • MRR@10 0.462, R@5 0.543, R@10 0.580, Hit@10 30/50
  • Build 1913.3s (31.9m), Search 69.9s
  • Local shard: 361 MB corpus JSONL + 511 KB manifest, gitignored

Tests

  • uv run --extra dev ruff check examples/ablation/run_tier1_benchmarks.py tests/test_tier1_benchmarks.py
  • uv run --extra dev ruff format --check examples/ablation/run_tier1_benchmarks.py tests/test_tier1_benchmarks.py
  • uv run --extra dev pytest tests/test_tier1_benchmarks.py -q
  • PYTHONUNBUFFERED=1 uv run --extra sqlite python examples/ablation/run_tier1_benchmarks.py --only msmarco --subset 1 --corpus-limit 2 --use-sqlite-graph --ingest-batch 1 --progress-every 1
  • PYTHONUNBUFFERED=1 uv run --extra sqlite python examples/ablation/run_tier1_benchmarks.py --only msmarco --subset 50 --corpus-limit 1000000 --use-sqlite-graph

@SonAIengine SonAIengine merged commit 9fc1995 into main Jul 2, 2026
2 checks passed
@SonAIengine SonAIengine deleted the codex/msmarco-million-progress branch July 2, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant