Skip to content

Document DeepSeek agent loop live eval#130

Merged
SonAIengine merged 1 commit into
mainfrom
docs-deepseek-agent-loop-eval
Jul 2, 2026
Merged

Document DeepSeek agent loop live eval#130
SonAIengine merged 1 commit into
mainfrom
docs-deepseek-agent-loop-eval

Conversation

@SonAIengine

Copy link
Copy Markdown
Contributor

Summary

  • add the DeepSeek Flash 20-query agent-loop live eval artifacts
  • update the public scale report with DeepSeek vs local qwen fallback comparison
  • record DeepSeek-only hits, exploration behavior, latency/token trade-off, and delayed-discovery cases

Results

  • DeepSeek Flash: 11/20 reach, zero-tool 0/20, duplicate calls 0, empty calls 8
  • local qwen force-first fallback: 9/20 reach on the same subset
  • DeepSeek-only hits: 178627, 68095, 1090242
  • qwen-only hit: 1101278

Verification

  • DeepSeek preflight: uv run python examples/ablation/run_agent_loop_benchmarks.py --llm-preset deepseek --preflight-only --preflight-timeout 15
  • DeepSeek live eval: PYTHONUNBUFFERED=1 SYNAPTIC_SQLITE_FTS_AND_FIRST_THRESHOLD=20 SYNAPTIC_SQLITE_FTS_LEXICAL_RERANK_POOL=500 uv run python examples/ablation/run_agent_loop_benchmarks.py --llm-preset deepseek --subset 20 --msmarco-path tests/benchmark/data/msmarco_passage_full.json --corpus-limit 8841823 --sqlite-db-path tests/benchmark/data/msmarco_full.db --max-turns 5 --llm-timeout 180 --preflight-timeout 15 --out-jsonl examples/ablation/diagnostics/agent_loop_deepseek_v4_flash_20.jsonl --resume
  • git diff --check

@SonAIengine SonAIengine merged commit 6285e79 into main Jul 2, 2026
2 checks passed
@SonAIengine SonAIengine deleted the docs-deepseek-agent-loop-eval branch July 2, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant