Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions examples/ablation/diagnostics/agent_loop_20260702_194134.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Agent Loop Retrieval Benchmark — Synaptic

- Run at: 2026-07-02 19:41:34 KST
- Dataset path: tests/benchmark/data/msmarco_passage_full.json
- SQLite DB path: tests/benchmark/data/msmarco_full.db
- Subset: 20
- Corpus limit: 8841823
- LLM base URL: https://api.deepseek.com/v1
- Model: deepseek-v4-flash
- Max turns: 5
- Sufficiency gate: yes
- Force first tool: yes
- Incremental JSONL: examples/ablation/diagnostics/agent_loop_deepseek_v4_flash_20.jsonl
- SQLite FTS AND-first threshold: 20
- SQLite FTS lexical rerank pool: 500

This measures LLM-planned exploration. The agent can change follow-up queries and tool choices based on evidence from earlier turns. The main metric is document reach, not ranked MRR, because the agent loop returns a cumulative evidence set.

## Summary

- Reach: 11/20 (0.550)
- Mean turns: 4.10
- Mean tool calls: 5.90
- Mean first relevant turn: 1.55
- Mean first relevant tool calls: 2.36
- Mean elapsed: 50.0s
- P50/P90 elapsed: 45.8s / 67.0s
- Mean prompt tokens: 20049
- Mean completion tokens: 966
- Mean unique tools: 2.35
- Mean unique search targets: 5.50
- Mean query rewrites: 4.25
- Queries with >1 tool type: 19/20
- Queries with query rewrites: 19/20
- Duplicate tool calls: 0
- Empty tool calls: 8

## Per Query

| QID | Reach | Turns | Calls | Tools | Targets | Rewrites | First Rel Turn | First Rel Calls | Found Relevant | Elapsed | Query |
|-----|:-----:|------:|------:|------:|--------:|---------:|---------------:|----------------:|----------------|--------:|-------|
| 300674 | yes | 4 | 4 | 2 | 3 | 3 | 1 | 1 | 7067032 | 36.2s | how many years did william bradford serve as governor of plymouth colony? |
| 125705 | no | 3 | 3 | 2 | 2 | 2 | - | - | - | 38.5s | define preventive |
| 94798 | yes | 4 | 5 | 2 | 5 | 4 | 1 | 1 | 7067181 | 46.4s | color overlay photoshop |
| 9083 | yes | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 7067274 | 28.3s | ____________________ is considered the father of modern medicine. |
| 174249 | no | 4 | 3 | 2 | 3 | 3 | - | - | - | 39.9s | does xpress bet charge to deposit money in your account |
| 320792 | no | 5 | 7 | 3 | 7 | 7 | - | - | - | 61.3s | how much is a cost to run disneyland |
| 1090270 | yes | 4 | 3 | 2 | 1 | 0 | 1 | 1 | 7067796 | 40.0s | botulinum definition |
| 1101279 | no | 5 | 6 | 2 | 6 | 4 | - | - | - | 53.8s | do physicians pay for insurance from their salaries? |
| 201376 | yes | 5 | 8 | 3 | 8 | 2 | 1 | 1 | 7068066 | 60.4s | here there be dragons comic |
| 54544 | no | 5 | 10 | 3 | 10 | 10 | - | - | - | 67.0s | blood diseases that are sexually transmitted |
| 118457 | no | 3 | 3 | 2 | 2 | 1 | - | - | - | 38.9s | define bona fides |
| 178627 | yes | 5 | 13 | 3 | 13 | 12 | 3 | 6 | 7068519 | 75.8s | effects of detox juice cleanse |
| 1101278 | no | 3 | 4 | 2 | 4 | 1 | - | - | - | 41.6s | do prince harry and william have last names |
| 68095 | yes | 4 | 8 | 2 | 8 | 4 | 1 | 1 | 7069266 | 59.2s | can hives be a sign of pregnancy |
| 87892 | yes | 5 | 10 | 3 | 8 | 8 | 3 | 5 | 7069601 | 64.6s | causes of petechial hemorrhage |
| 257309 | no | 4 | 5 | 3 | 5 | 5 | - | - | - | 54.1s | how long does it take to get your bsrn if you already have a bachelors degree |
| 1090242 | yes | 5 | 14 | 3 | 14 | 13 | 3 | 7 | 7070556 | 70.1s | symptoms of ptsd in vietnam veterans |
| 211691 | no | 5 | 5 | 3 | 4 | 3 | - | - | - | 45.3s | how coffee works quote |
| 165002 | yes | 4 | 4 | 2 | 4 | 1 | 1 | 1 | 7070877 | 44.6s | does contraction of the ciliary muscles shorten the lens |
| 1101276 | yes | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 7070950 | 34.6s | do spiders eat other animals |
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{"completion_tokens": 701, "duplicate_tool_calls": 0, "elapsed_sec": 36.18947866698727, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 10, "found_relevant_docs": ["7067032"], "prompt_tokens": 16545, "qid": "300674", "query": "how many years did william bradford serve as governor of plymouth colony?", "query_rewrites": 3, "reached": true, "relevant_docs": ["7067032"], "search_targets": ["william bradford governor plymouth colony years served", "william bradford governor plymouth colony years", "william bradford served as plymouth governor for 30 years"], "tool_calls": 4, "tool_sequence": ["search", "get_document", "get_document", "get_document"], "turns": 4, "unique_search_targets": 3, "unique_tools": 2}
{"completion_tokens": 702, "duplicate_tool_calls": 0, "elapsed_sec": 38.52050423901528, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 5, "found_relevant_docs": [], "prompt_tokens": 10944, "qid": "125705", "query": "define preventive", "query_rewrites": 2, "reached": false, "relevant_docs": ["7067056"], "search_targets": ["preventive definition", "preventive maintenance definition"], "tool_calls": 3, "tool_sequence": ["deep_search", "get_document", "get_document"], "turns": 3, "unique_search_targets": 2, "unique_tools": 2}
{"completion_tokens": 956, "duplicate_tool_calls": 0, "elapsed_sec": 46.40595618914813, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7067181"], "prompt_tokens": 16293, "qid": "94798", "query": "color overlay photoshop", "query_rewrites": 4, "reached": true, "relevant_docs": ["7067181"], "search_targets": ["color overlay photoshop", "photoshop layer styles color overlay settings", "color overlay photoshop layer styles", "photoshop color overlay", "how to use color overlay in photoshop"], "tool_calls": 5, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "get_document"], "turns": 4, "unique_search_targets": 5, "unique_tools": 2}
{"completion_tokens": 208, "duplicate_tool_calls": 0, "elapsed_sec": 28.265028935857117, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7067274"], "prompt_tokens": 6460, "qid": "9083", "query": "____________________ is considered the father of modern medicine.", "query_rewrites": 1, "reached": true, "relevant_docs": ["7067274"], "search_targets": ["father of modern medicine"], "tool_calls": 1, "tool_sequence": ["deep_search"], "turns": 2, "unique_search_targets": 1, "unique_tools": 1}
{"completion_tokens": 773, "duplicate_tool_calls": 0, "elapsed_sec": 39.93833482963964, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 5, "found_relevant_docs": [], "prompt_tokens": 14916, "qid": "174249", "query": "does xpress bet charge to deposit money in your account", "query_rewrites": 3, "reached": false, "relevant_docs": ["7067348"], "search_targets": ["xpressbet deposit fees charges", "xpressbet deposit convenience fee cashier", "xpressbet no per wager fees charged"], "tool_calls": 3, "tool_sequence": ["deep_search", "get_document", "get_document"], "turns": 4, "unique_search_targets": 3, "unique_tools": 2}
{"completion_tokens": 919, "duplicate_tool_calls": 0, "elapsed_sec": 61.31934346398339, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 45, "found_relevant_docs": [], "prompt_tokens": 29795, "qid": "320792", "query": "how much is a cost to run disneyland", "query_rewrites": 7, "reached": false, "relevant_docs": ["7067677"], "search_targets": ["disneyland operating cost operating expense", "disneyland cost to operate per day annual operating cost", "disneyland operating cost", "disneyland resort annual operating expense cost to run", "\"disneyland\" \"operating\" expense million billion", "disneyland operating cost expense billion million", "disneyland park operating costs annual expense revenue"], "tool_calls": 7, "tool_sequence": ["deep_search", "search", "get_document", "search", "search", "get_document", "search"], "turns": 5, "unique_search_targets": 7, "unique_tools": 3}
{"completion_tokens": 726, "duplicate_tool_calls": 0, "elapsed_sec": 39.983878755010664, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7067796"], "prompt_tokens": 14695, "qid": "1090270", "query": "botulinum definition", "query_rewrites": 0, "reached": true, "relevant_docs": ["7067796"], "search_targets": ["botulinum definition"], "tool_calls": 3, "tool_sequence": ["deep_search", "get_document", "get_document"], "turns": 4, "unique_search_targets": 1, "unique_tools": 2}
{"completion_tokens": 1042, "duplicate_tool_calls": 0, "elapsed_sec": 53.772729511838406, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 40, "found_relevant_docs": [], "prompt_tokens": 29692, "qid": "1101279", "query": "do physicians pay for insurance from their salaries?", "query_rewrites": 4, "reached": false, "relevant_docs": ["7067891"], "search_targets": ["physicians pay for insurance from their salaries medical malpractice", "1027637", "do physicians pay for malpractice insurance out of their salary", "2055614", "physicians pay for their own health insurance disability insurance from salary", "who pays for physician malpractice insurance employer or doctor self-employed"], "tool_calls": 6, "tool_sequence": ["search", "get_document", "search", "get_document", "search", "search"], "turns": 5, "unique_search_targets": 6, "unique_tools": 2}
{"completion_tokens": 1084, "duplicate_tool_calls": 0, "elapsed_sec": 60.38634355319664, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 25, "found_relevant_docs": ["7068066"], "prompt_tokens": 28315, "qid": "201376", "query": "here there be dragons comic", "query_rewrites": 2, "reached": true, "relevant_docs": ["7068066"], "search_targets": ["here there be dragons comic", "7068066", "7068073", "7068070", "\"here there be dragons\" comic james owen", "\"the deep: here be dragons\" comic", "7068069", "6132021"], "tool_calls": 8, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "search", "search", "get_document", "get_document"], "turns": 5, "unique_search_targets": 8, "unique_tools": 3}
{"completion_tokens": 1279, "duplicate_tool_calls": 0, "elapsed_sec": 67.03253900120035, "empty_tool_calls": 3, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 30, "found_relevant_docs": [], "prompt_tokens": 29882, "qid": "54544", "query": "blood diseases that are sexually transmitted", "query_rewrites": 10, "reached": false, "relevant_docs": ["7068203"], "search_targets": ["blood diseases sexually transmitted", "sexually transmitted blood diseases list hepatitis hiv syphilis", "bloodborne diseases sexually transmitted infections list", "blood diseases sexually transmitted list", "hiv hepatitis b syphilis sexually transmitted blood", "blood tested syphilis hepatitis west nile virus sexually transmitted", "hepatitis b sexually transmitted blood disease", "hiv aids sexually transmitted blood infection", "three types of hepatitis virus sexually transmitted", "sexually transmitted and blood borne infections stbbi"], "tool_calls": 10, "tool_sequence": ["deep_search", "deep_search", "search", "get_document", "get_document", "get_document", "search", "search", "get_document", "get_document"], "turns": 5, "unique_search_targets": 10, "unique_tools": 3}
{"completion_tokens": 563, "duplicate_tool_calls": 0, "elapsed_sec": 38.8562790742144, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 5, "found_relevant_docs": [], "prompt_tokens": 11027, "qid": "118457", "query": "define bona fides", "query_rewrites": 1, "reached": false, "relevant_docs": ["7068493"], "search_targets": ["define bona fides", "bona fides definition"], "tool_calls": 3, "tool_sequence": ["deep_search", "get_document", "get_document"], "turns": 3, "unique_search_targets": 2, "unique_tools": 2}
{"completion_tokens": 1770, "duplicate_tool_calls": 0, "elapsed_sec": 75.8238385450095, "empty_tool_calls": 2, "first_relevant_tool_calls": 6, "first_relevant_turn": 3, "found_ids_count": 25, "found_relevant_docs": ["7068519"], "prompt_tokens": 31934, "qid": "178627", "query": "effects of detox juice cleanse", "query_rewrites": 12, "reached": true, "relevant_docs": ["7068519", "7068520"], "search_targets": ["effects of detox juice cleanse", "effects of detox juice cleanse benefits side effects", "detox juice cleanse effects side effects weight loss", "detox juice cleanse effects", "detox juice cleanse health effects benefits", "juice cleanse side effects weight loss detox", "juice cleanse health benefits effects", "common side effects of detox juices", "positive side effects of detox cleanse weight loss", "detox benefits better health glowing skin better digestion increased energy", "juice cleanse best ways to detox losing weight", "juice cleanse negative side effects headaches fatigue detox", "health enhancements juice cleanse breaking cravings"], "tool_calls": 13, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "search", "search", "get_document", "get_document", "get_document", "get_document", "get_document", "search", "get_document"], "turns": 5, "unique_search_targets": 13, "unique_tools": 3}
{"completion_tokens": 927, "duplicate_tool_calls": 0, "elapsed_sec": 41.57614717306569, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 10, "found_relevant_docs": [], "prompt_tokens": 12403, "qid": "1101278", "query": "do prince harry and william have last names", "query_rewrites": 1, "reached": false, "relevant_docs": ["7068907"], "search_targets": ["prince harry william last name british royal family surname", "7068913", "67332", "2919121"], "tool_calls": 4, "tool_sequence": ["search", "get_document", "get_document", "get_document"], "turns": 3, "unique_search_targets": 4, "unique_tools": 2}
{"completion_tokens": 1350, "duplicate_tool_calls": 0, "elapsed_sec": 59.2060615378432, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7069266"], "prompt_tokens": 17972, "qid": "68095", "query": "can hives be a sign of pregnancy", "query_rewrites": 4, "reached": true, "relevant_docs": ["7069266"], "search_targets": ["can hives be a sign of pregnancy", "hives and pregnancy", "hives early pregnancy symptom", "hives during pregnancy", "hives during pregnancy causes", "3114980", "7069269", "6159958"], "tool_calls": 8, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "get_document", "get_document", "get_document", "get_document"], "turns": 4, "unique_search_targets": 8, "unique_tools": 2}
{"completion_tokens": 1155, "duplicate_tool_calls": 0, "elapsed_sec": 64.58518175315112, "empty_tool_calls": 0, "first_relevant_tool_calls": 5, "first_relevant_turn": 3, "found_ids_count": 15, "found_relevant_docs": ["7069601"], "prompt_tokens": 27869, "qid": "87892", "query": "causes of petechial hemorrhage", "query_rewrites": 8, "reached": true, "relevant_docs": ["7069601"], "search_targets": ["causes of petechial hemorrhage petechiae", "causes of petechial hemorrhage petechiae causes", "petechial hemorrhage causes medical conditions thrombocytopenia infection trauma", "causes of petechial hemorrhages injury trauma brain", "causes of petechiae medical conditions", "petechial rash causes", "petechial hemorrhage causes factors medical conditions infections trauma", "causes of intracranial hemorrhage petechial"], "tool_calls": 10, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "search", "get_document", "get_document", "get_document", "get_document", "get_document"], "turns": 5, "unique_search_targets": 8, "unique_tools": 3}
{"completion_tokens": 1117, "duplicate_tool_calls": 0, "elapsed_sec": 54.08409190410748, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 25, "found_relevant_docs": [], "prompt_tokens": 19436, "qid": "257309", "query": "how long does it take to get your bsrn if you already have a bachelors degree", "query_rewrites": 5, "reached": false, "relevant_docs": ["4959637"], "search_targets": ["bsrn second degree bachelors how long does it take", "accelerated bsn second degree how long", "bsrn program duration second bachelor degree nursing", "accelerated bsn second degree 11 to 18 months", "accelerated bsn program how long second degree"], "tool_calls": 5, "tool_sequence": ["deep_search", "search", "search", "get_document", "get_document"], "turns": 4, "unique_search_targets": 5, "unique_tools": 3}
{"completion_tokens": 1504, "duplicate_tool_calls": 0, "elapsed_sec": 70.11464598635212, "empty_tool_calls": 3, "first_relevant_tool_calls": 7, "first_relevant_turn": 3, "found_ids_count": 25, "found_relevant_docs": ["7070556"], "prompt_tokens": 31491, "qid": "1090242", "query": "symptoms of ptsd in vietnam veterans", "query_rewrites": 13, "reached": true, "relevant_docs": ["7070556"], "search_targets": ["symptoms of ptsd in vietnam veterans", "ptsd symptoms vietnam veterans", "ptsd symptoms vietnam veterans stress reactions", "vietnam veterans ptsd symptoms decades later", "vietnam syndrome symptoms physical psychological", "ptsd symptoms vietnam veterans nightmares flashbacks hypervigilance", "vietnam veterans ptsd clinical symptoms diagnostic", "symptoms of ptsd", "ptsd symptoms severe anxiety flashbacks nightmares uncontrollable thoughts", "ptsd symptoms reliving event bad memories nightmares flashbacks fear horror", "ptsd symptoms insomnia anxiety irritability vietnam veterans", "ptsd symptoms flashbacks nightmares severe anxiety uncontrollable thoughts", "ptsd symptoms must persist for at least a month diagnosis", "vietnam war ptsd rates national vietnam veterans readjustment study"], "tool_calls": 14, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document", "get_document", "search", "search", "get_document", "get_document", "get_document", "get_document", "get_document", "get_document", "get_document"], "turns": 5, "unique_search_targets": 14, "unique_tools": 3}
{"completion_tokens": 820, "duplicate_tool_calls": 0, "elapsed_sec": 45.28732342366129, "empty_tool_calls": 0, "first_relevant_tool_calls": 0, "first_relevant_turn": 0, "found_ids_count": 30, "found_relevant_docs": [], "prompt_tokens": 24872, "qid": "211691", "query": "how coffee works quote", "query_rewrites": 3, "reached": false, "relevant_docs": ["7070643"], "search_targets": ["how coffee works quote", "\"how coffee works\" quote", "\"how coffee works\" famous quote saying", "\"coffee\" \"how it works\" quote"], "tool_calls": 5, "tool_sequence": ["search", "deep_search", "get_document", "search", "search"], "turns": 5, "unique_search_targets": 4, "unique_tools": 3}
{"completion_tokens": 1089, "duplicate_tool_calls": 0, "elapsed_sec": 44.62640297180042, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7070877"], "prompt_tokens": 15925, "qid": "165002", "query": "does contraction of the ciliary muscles shorten the lens", "query_rewrites": 1, "reached": true, "relevant_docs": ["7070877"], "search_targets": ["contraction of ciliary muscles shortens lens", "7070881", "7070877", "328493"], "tool_calls": 4, "tool_sequence": ["deep_search", "get_document", "get_document", "get_document"], "turns": 4, "unique_search_targets": 4, "unique_tools": 2}
{"completion_tokens": 631, "duplicate_tool_calls": 0, "elapsed_sec": 34.6135217868723, "empty_tool_calls": 0, "first_relevant_tool_calls": 1, "first_relevant_turn": 1, "found_ids_count": 5, "found_relevant_docs": ["7070950"], "prompt_tokens": 10514, "qid": "1101276", "query": "do spiders eat other animals", "query_rewrites": 1, "reached": true, "relevant_docs": ["7070950"], "search_targets": ["do spiders eat other animals", "what do spiders eat? predatory on other animals"], "tool_calls": 2, "tool_sequence": ["deep_search", "get_document"], "turns": 3, "unique_search_targets": 2, "unique_tools": 2}
Loading
Loading