Calendar Day
Wednesday, June 17, 2026 (PR 2 of 2)
Planned Effort
3 story points — sprint item #6 (Low)
Depends on: Wednesday PR 1 (session cache #4) merged — baselines must reflect cached-path performance.
Builds on: Week 2 PR #76 benchmark harness and benchmarks/baselines.json schema.
Problem
The CI benchmarks job runs pytest tests/benchmarks/ --benchmark-only and uploads artifacts, but is labeled "informational" with no threshold. benchmarks/baselines.json has empty groups — regressions in parse/export/search pass silently.
Goal
One merged PR that populates baselines from a post-cache run, adds a +20% regression gate, documents baseline updates, and renames the CI job to signal it is gated.
Scope
Touch points
benchmarks/baselines.json — populate means (parse small/medium/large, export, search)
scripts/check_benchmark_regression.py (new) — compare current vs baseline, exit non-zero if >20%
.github/workflows/ci.yml — regression step after benchmark run; rename job
Makefile or docs — make update-baselines command
- Unit test for missing-baseline graceful handling (warn, don't fail)
Gate behavior
- Fail if
current_mean / baseline_mean > 1.20
- Missing baselines for new benchmark names: warn, return 0
- Gate on ubuntu-latest job only (avoid cross-OS variance)
Acceptance Criteria
Verification
cd C:\Users\Jasen\CppAliance\claude-code-chat-browser
.\.venv\Scripts\Activate.ps1
pytest tests/benchmarks/ --benchmark-only
python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.json
Out of Scope
- Session cache implementation (Wednesday PR 1 — #4)
- New benchmark scenarios beyond existing three bench files
Calendar Day
Wednesday, June 17, 2026 (PR 2 of 2)
Planned Effort
3 story points — sprint item #6 (Low)
Depends on: Wednesday PR 1 (session cache #4) merged — baselines must reflect cached-path performance.
Builds on: Week 2 PR #76 benchmark harness and
benchmarks/baselines.jsonschema.Problem
The CI
benchmarksjob runspytest tests/benchmarks/ --benchmark-onlyand uploads artifacts, but is labeled "informational" with no threshold.benchmarks/baselines.jsonhas empty groups — regressions in parse/export/search pass silently.Goal
One merged PR that populates baselines from a post-cache run, adds a +20% regression gate, documents baseline updates, and renames the CI job to signal it is gated.
Scope
Touch points
benchmarks/baselines.json— populate means (parse small/medium/large, export, search)scripts/check_benchmark_regression.py(new) — compare current vs baseline, exit non-zero if >20%.github/workflows/ci.yml— regression step after benchmark run; rename jobMakefileor docs —make update-baselinescommandGate behavior
current_mean / baseline_mean > 1.20Acceptance Criteria
benchmarks/baselines.jsonpopulated from post-cache ubuntu runmake update-baselines(or documented equivalent) regenerates baselinesVerification
Out of Scope