perf: add mtime-invalidated LRU session cache; route APIs through it#90
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds ChangesSession Cache Layer and API Wiring
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/test_session_cache.py (1)
31-90: ⚡ Quick winAdd one test for relative-vs-absolute path aliasing.
The suite doesn’t yet assert the “absolute-path key” contract directly. Add a test that warms cache via a relative path and verifies absolute-path access is a hit (no reparse), so key normalization regressions are caught.
Suggested test
+def test_relative_and_absolute_paths_share_cache_entry( + sample_session: Path, monkeypatch: pytest.MonkeyPatch +) -> None: + rel_path = os.path.relpath(sample_session) + abs_path = str(sample_session.resolve()) + + get_cached_session(rel_path) + calls = 0 + + def counting_parse(p: str): + nonlocal calls + calls += 1 + return parse_session(p) + + monkeypatch.setattr("utils.session_cache.parse_session", counting_parse) + get_cached_session(abs_path) + assert calls == 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_session_cache.py` around lines 31 - 90, Add a new test function to validate the "absolute-path key" contract for path normalization. The test should follow a similar pattern to test_cache_hit_avoids_reparse by using monkeypatch to count parse_session calls, but instead of testing cache hits for the same path, it should warm the cache with a relative path to the sample_session file, then call get_cached_session with the absolute path of the same file and verify that the parse_session call count remains zero, confirming that relative and absolute paths resolve to the same cache key and prevent unnecessary reparses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/benchmarks/test_cache_bench.py`:
- Around line 21-23: The cold() benchmark function includes the clear_cache()
call within its body, which means the benchmark timing measures both
cache-clearing overhead and actual parsing time together. Move the clear_cache()
call outside of the cold() function so that it executes before the benchmark
measurement begins, leaving only the get_cached_session(path) call to be
measured. This ensures the cold benchmark metric reflects only the actual
cache-miss parse cost without the reset overhead, making cold-vs-warm
comparisons accurate.
In `@tests/test_session_cache.py`:
- Around line 54-56: The test uses a fixed sleep duration to force mtime
changes, which is flaky on filesystems with coarse mtime resolution. Replace the
time.sleep(0.05) call with an explicit mtime manipulation using os.utime() on
the session file at path to ensure a deterministic modification time change,
then call sample_session.touch() and verify the cached session is updated. This
removes the timing dependency and makes the test reliable across different
filesystem types.
In `@utils/session_cache.py`:
- Around line 43-49: The set_max_entries function does not validate the
max_entries parameter before use. If a negative value is passed, the while loop
condition while len(_cache) > _max_entries will perpetually evaluate to true
when the cache contains entries, causing the loop to continuously call popitem
until the cache is empty, at which point popitem raises a KeyError. Add input
validation at the beginning of the set_max_entries function to ensure
max_entries is a non-negative integer, and raise an appropriate exception (such
as ValueError) if an invalid value is provided.
---
Nitpick comments:
In `@tests/test_session_cache.py`:
- Around line 31-90: Add a new test function to validate the "absolute-path key"
contract for path normalization. The test should follow a similar pattern to
test_cache_hit_avoids_reparse by using monkeypatch to count parse_session calls,
but instead of testing cache hits for the same path, it should warm the cache
with a relative path to the sample_session file, then call get_cached_session
with the absolute path of the same file and verify that the parse_session call
count remains zero, confirming that relative and absolute paths resolve to the
same cache key and prevent unnecessary reparses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51b61091-441c-4eeb-ab7c-a6ff72001b94
📒 Files selected for processing (9)
api/export_api.pyapi/projects.pyapi/search.pyapi/sessions.pytests/benchmarks/test_cache_bench.pytests/test_api_routes.pytests/test_error_propagation.pytests/test_session_cache.pyutils/session_cache.py
|
@CodeRabbit, please review this PR in more depth |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Read all changed files
echo "=== utils/session_cache.py ==="
cat utils/session_cache.py
echo ""
echo "=== api/sessions.py ==="
cat api/sessions.py
echo ""
echo "=== api/search.py ==="
cat api/search.py
echo ""
echo "=== api/projects.py ==="
cat api/projects.py
echo ""
echo "=== api/export_api.py ==="
cat api/export_api.py
echo ""
echo "=== tests/test_session_cache.py ==="
cat tests/test_session_cache.py
echo ""
echo "=== tests/benchmarks/test_cache_bench.py ==="
cat tests/benchmarks/test_cache_bench.py
echo ""
echo "=== tests/test_api_routes.py (relevant section) ==="
grep -n "get_cached_session\|parse_session\|PARSE_ERROR\|500\|parse_failure" tests/test_api_routes.py | head -60
echo ""
echo "=== tests/test_error_propagation.py ==="
cat tests/test_error_propagation.py
echo ""
echo "=== utils/jsonl_parser.py (to understand parse_session signature) ==="
cat utils/jsonl_parser.py
echo ""
echo "=== utils/__init__.py or similar ==="
ls utils/Length of output: 52141
In-Depth Review:
|
Re-read mtime after parse and only cache when the file was stable during the read; skip cache writes when max_entries is 0. Move get_cached_session to module-level import in projects.py; improve mtime/LRU tests; document concurrent cold-miss behavior. stop search scan after max_results across projects
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utils/session_cache.py`:
- Around line 35-37: The os.path.getmtime(abspath) call can raise an OSError if
the file is removed or renamed after parse_session(abspath) succeeds, causing
the request to fail even though valid parsed data already exists. Wrap the
os.path.getmtime(abspath) call in a try-except block that catches OSError, and
when caught, skip the cache write logic and return the parsed value directly
since it is already valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1ff3e6cd-426d-4de3-9cb0-e9731794299d
📒 Files selected for processing (5)
api/projects.pyapi/search.pytests/test_error_propagation.pytests/test_session_cache.pyutils/session_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
- api/projects.py
- tests/test_session_cache.py
- api/search.py
Closes #82
Summary
Adds an mtime-invalidated, LRU-bounded in-memory session cache so repeated API reads stop re-parsing JSONL from disk. Routes all parse-backed API handlers through
get_cached_session()with no JSON response changes.Sprint item #4 (5 pt) — Wednesday PR 1 of 2. Merge before PR 2 (benchmark regression gate #6) so baselines capture cached-path performance.
Problem
Every session-touching request called
parse_session(filepath)from disk.get_session()andget_session_stats()parsed the same file twice;search()re-parsed the entire corpus per query. No in-memory cache, no mtime check, no reuse across requests.Changes
New:
utils/session_cache.py(mtime, SessionDict)os.path.getmtime()change (edits picked up without restart)OrderedDict(default max 200 entries)threading.Lockfor--debugreloader / threaded safetyget_cached_session(path),clear_cache(),set_max_entries()(tests)Wired routes
api/sessions.py—get_session+get_session_statsshare one cached parseapi/search.py— per-session loop uses cache (big win for repeated searches)api/projects.py—get_project_sessionsuses cache (get_projectsstill usesquick_session_info)api/export_api.py— single-session export uses cacheTests & benchmarks
tests/test_session_cache.py— cache hit, mtime invalidation, LRU evictiontests/benchmarks/test_cache_bench.py— cold parse vs warm hittest_api_routes.pyandtest_error_propagation.pyAcceptance criteria
utils/session_cache.pystores parsedSessionDictkeyed by absolute pathos.path.getmtime()changesapi/sessions.py—get_session+get_session_statsshare one cached parseapi/search.pyuses cacheapi/projects.pyandapi/export_api.pyuse cachetests/benchmarks/test_parse_memory.pystill passesTest plan
pytest -q— 382 passedpytest tests/benchmarks/test_parse_memory.py -qpytest tests/benchmarks/test_cache_bench.py --benchmark-onlymypy -p api -p utils -p modelsruff check .Out of scope
quick_session_infolightweight cacheutils/export_engine) — still callsparse_sessiondirectlyNotes
Cache benchmark (local, medium corpus): warm hit ~11µs vs cold parse ~1.45ms (~135× faster).
Summary by CodeRabbit
Release Notes
New Features
Performance Improvements
Bug Fixes
Tests