claude-code-chat-browser: Benchmark regression gate in CI

## Calendar Day

Wednesday, June 17, 2026 (**PR 2 of 2**)

## Planned Effort

**3 story points** — sprint item **#6** (Low)

**Depends on:** Wednesday PR 1 (session cache #4) merged — baselines must reflect cached-path performance.

**Builds on:** Week 2 PR #76 benchmark harness and `benchmarks/baselines.json` schema.

## Problem

The CI `benchmarks` job runs `pytest tests/benchmarks/ --benchmark-only` and uploads artifacts, but is labeled "informational" with no threshold. `benchmarks/baselines.json` has empty groups — regressions in parse/export/search pass silently.

## Goal

One merged PR that populates baselines from a post-cache run, adds a +20% regression gate, documents baseline updates, and renames the CI job to signal it is gated.

## Scope

### Touch points

- `benchmarks/baselines.json` — populate means (parse small/medium/large, export, search)
- `scripts/check_benchmark_regression.py` (new) — compare current vs baseline, exit non-zero if >20%
- `.github/workflows/ci.yml` — regression step after benchmark run; rename job
- `Makefile` or docs — `make update-baselines` command
- Unit test for missing-baseline graceful handling (warn, don't fail)

### Gate behavior

- Fail if `current_mean / baseline_mean > 1.20`
- Missing baselines for new benchmark names: warn, return 0
- Gate on ubuntu-latest job only (avoid cross-OS variance)

## Acceptance Criteria

- [ ] `benchmarks/baselines.json` populated from post-cache ubuntu run
- [ ] CI fails on injected >20% regression; passes when green
- [ ] Missing-baseline case warns without failing (tested)
- [ ] `make update-baselines` (or documented equivalent) regenerates baselines
- [ ] Job renamed from "(informational)" to gated label
- [ ] PR approved by at least 1 reviewer

## Verification

```powershell
cd C:\Users\Jasen\CppAliance\claude-code-chat-browser
.\.venv\Scripts\Activate.ps1
pytest tests/benchmarks/ --benchmark-only
python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.json
```

## Out of Scope

- Session cache implementation (Wednesday PR 1 — #4)
- New benchmark scenarios beyond existing three bench files


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

claude-code-chat-browser: Benchmark regression gate in CI #83

Calendar Day

Planned Effort

Problem

Goal

Scope

Touch points

Gate behavior

Acceptance Criteria

Verification

Out of Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

claude-code-chat-browser: Benchmark regression gate in CI #83

Description

Calendar Day

Planned Effort

Problem

Goal

Scope

Touch points

Gate behavior

Acceptance Criteria

Verification

Out of Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions