Skip to content

Add MS MARCO large-tier benchmark path#103

Merged
SonAIengine merged 1 commit into
mainfrom
codex/msmarco-large-tier
Jul 2, 2026
Merged

Add MS MARCO large-tier benchmark path#103
SonAIengine merged 1 commit into
mainfrom
codex/msmarco-large-tier

Conversation

@SonAIengine

Copy link
Copy Markdown
Contributor

Summary

  • add a JSONL-sharded MS MARCO passage downloader for large public-corpus evaluation
  • teach the tier1 runner to load metadata JSON + corpus JSONL while preserving selected-query gold docs under corpus limits
  • add optional manual MS MARCO large-tier dispatch inputs to the public scale workflow
  • document the new 100k MS MARCO smoke result and ignore generated JSONL artifacts

Local large-tier result

  • MS MARCO passage dev, 100k docs, 50 queries, embedder-free SQLite/EvidenceSearch
  • MRR@10 0.673, R@5 0.740, R@10 0.770, Hit@10 39/50
  • Build 81.9s, Search 5.4s

Tests

  • uv run --extra dev ruff check examples/ablation/download_benchmarks.py examples/ablation/run_tier1_benchmarks.py tests/test_tier1_benchmarks.py tests/test_download_benchmarks.py
  • uv run --extra dev ruff format --check examples/ablation/download_benchmarks.py examples/ablation/run_tier1_benchmarks.py tests/test_tier1_benchmarks.py tests/test_download_benchmarks.py
  • uv run --extra dev pytest tests/test_tier1_benchmarks.py tests/test_download_benchmarks.py -q
  • uv run python - <<PY
    import yaml
    from pathlib import Path
    yaml.safe_load(Path(".github/workflows/public-scale.yml").read_text())
    print("yaml ok")
    PY
  • PYTHONUNBUFFERED=1 uv run --extra eval python examples/ablation/download_benchmarks.py --only msmarco_passage --large-corpus-limit 100000
  • PYTHONUNBUFFERED=1 uv run --extra sqlite python examples/ablation/run_tier1_benchmarks.py --only msmarco --subset 50 --corpus-limit 100000 --use-sqlite-graph

@SonAIengine SonAIengine merged commit 510ad46 into main Jul 2, 2026
2 checks passed
@SonAIengine SonAIengine deleted the codex/msmarco-large-tier branch July 2, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant