Skip to content

perf(d-4 bootstrap): workflow_dispatch automation for baseline collection (standards#99)#26

Merged
hyperpolymath merged 1 commit into
mainfrom
perf/d4-rebaseline-automation
May 30, 2026
Merged

perf(d-4 bootstrap): workflow_dispatch automation for baseline collection (standards#99)#26
hyperpolymath merged 1 commit into
mainfrom
perf/d4-rebaseline-automation

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Phase D-4 of the single-lane HCG tier-2 channel (standards#91 / #99) is "real baseline numbers populated in bench/baseline.json and the perf-regression gate armed by flipping _status to active". The rebaseline ritual in docs/perf-contract.md § Baseline lifecycle defines step 2 as just bench-collect on a CI-equivalent target — but the published reference per the same doc is ubuntu-latest GHA runners, and the ritual was authored as a manual local step that requires the operator to have an Elixir 1.19 / OTP 28 toolchain on their machine. That gap blocked D-4 from being executable by anyone without a matching local dev env.

This PR adds the missing on-the-published-target automation. It does not collect numbers or flip _status itself; those remain the maintainer's deliberate acts. It provides the means.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only, and D-4 baseline collection + the _status flip still pend under #99 even after this lands. This PR makes D-4 executable; the generated rebaseline PR + the flip-to-active jointly close it. Same posture as PR #14 (D-2) and PR #22 (D-3).

What landed

  • .github/workflows/perf-rebaseline.ymlworkflow_dispatch-only workflow that runs mix run bench/gateway_latency.exs on ubuntu-latest (same target perf-regression.yml uses, so numbers are comparable), pipes results through bench/rebaseline.exs, opens a perf: rebaseline (standards#99) PR with the regenerated bench/baseline.json. SHA-pinned actions match perf-regression.yml; reuses the same runner.os-perf-${hashFiles(mix.lock)} cache key so the first rebaseline primes off the warm cache. Deliberately NO concurrency: cancel-in-progress — an operator-initiated rebaseline should complete on its own (this workflow has no obsolescence relationship the way per-PR runs do).
  • bench/rebaseline.exs — reads bench/results.json + bench/baseline.json, writes a new bench/baseline.json with real p50/p95/p99/ips per scenario. Preserves _comment, _schema_version, tolerance, and per-scenario _comment_* fields. Leaves _status as scaffold-placeholder — flipping to active is the maintainer's separate review step. Field order is preserved end-to-end via Jason.OrderedObject (decode with objects: :ordered_objects, encode back; Jason 1.4+ is already in mix.lock) so the rebaseline diff is review-grade (numbers move; structure does not).
  • Justfile — adds just rebaseline (runs harness + regeneration script) so the same two-step sequence the workflow runs is also available locally for operators previewing a rebaseline. The just bench-collect message is updated to point at just rebaseline.
  • docs/perf-contract.md — splits § Baseline lifecycle into "Automated (preferred — D-4 bootstrap)" and "Manual (for local previews or operators without GHA access)". Both paths leave _status as scaffold-placeholder; flipping to active is called out as a separate deliberate decision in either path.

What is deliberately NOT in this PR

  • The real baseline numbers themselves — those land in the generated perf: rebaseline (standards#99) PR after the maintainer dispatches the workflow.
  • Flipping bench/baseline.json _status to active — maintainer judgement on noise/spread; may land in the same generated PR or in a follow-up.
  • Tightening tolerance ratios — also a maintainer judgement, post-D-4.
  • Schema-drift hardening in bench/compare.exs (new scenario in results but not baseline silently passes in active mode) — separate defensive D-3 follow-up, not coupled to D-4 collection.

Test plan

  • CI: existing workflows (governance, hypatia-scan, dogfood-gate, codeql, scorecard, secret-scanner) green on this PR. perf-regression.yml will also fire — it stays in scaffold mode since bench/baseline.json _status is unchanged.
  • CI: Perf Rebaseline workflow does NOT auto-fire (workflow_dispatch only) on this PR's push.
  • Manual (post-merge, owner): dispatch Perf Rebaseline workflow from the Actions tab → it should run the harness, regenerate bench/baseline.json, push a perf/rebaseline-<run-id> branch, and open a perf: rebaseline (standards#99) PR. bench/results.json + bench/console.log are attached as perf-rebaseline-results artefact for 30 days.
  • Manual (post-merge, owner): review the generated PR — sanity-check numbers + spread, then either flip _statusactive in the same PR (single-PR D-4 + D-3 close) or merge as-is and follow up with a _status flip PR.

Downstream unblock

The boj-server rollout-prerequisite checklist in docs/integration/hcg-tier2-rollout-runbook.md § 1.1 lists D-3 (gate armed) and D-4 (numbers populated) as the remaining open items gating Phase E rollout. After this PR lands, the path to ticking both boxes is one workflow dispatch + one (or two) maintainer review/merge events, no local Elixir/OTP toolchain required.

Owner merges; not for admin-merge.

🤖 Generated with Claude Code


Generated by Claude Code

…tion

Phase D-4 of the single-lane HCG tier-2 channel (standards#91 / #99)
is "real baseline numbers populated in bench/baseline.json and the
perf-regression gate armed by flipping _status to active". The
rebaseline ritual in docs/perf-contract.md § Baseline lifecycle
defines step 2 as `just bench-collect` on a CI-equivalent target,
but the published reference per the same doc is `ubuntu-latest` GHA
runners — yet the ritual was authored as a manual local step that
requires the operator to have an Elixir 1.19 / OTP 28 toolchain on
their machine. That gap blocked D-4 from being executable by anyone
without a matching local dev env.

This PR adds the missing on-the-published-target automation. It does
NOT collect numbers or flip _status itself; those are still the
maintainer's deliberate acts. It provides the means.

What landed
───────────

* `.github/workflows/perf-rebaseline.yml` — manual workflow_dispatch
  workflow that runs `mix run bench/gateway_latency.exs` on
  ubuntu-latest (same target perf-regression.yml uses, so numbers
  are comparable), pipes results through bench/rebaseline.exs, opens
  a `perf: rebaseline (standards#99)` PR with the regenerated
  bench/baseline.json. Uses the same SHA-pinned actions and the
  same `runner.os-perf-${hashFiles(mix.lock)}` cache key as
  perf-regression.yml so the first rebaseline run primes off the
  warm cache. Deliberately NO concurrency cancel-in-progress (an
  operator-initiated rebaseline should complete on its own; this
  workflow has no obsolescence relationship the way per-PR runs do).

* `bench/rebaseline.exs` — reads bench/results.json + bench/baseline.json,
  writes a new bench/baseline.json with real p50/p95/p99/ips per
  scenario. Preserves _comment, _schema_version, tolerance, and
  per-scenario _comment_* fields. Leaves _status as
  "scaffold-placeholder" — flipping to "active" is the maintainer's
  separate review step. Field order is preserved end-to-end via
  Jason.OrderedObject (decode with `objects: :ordered_objects`,
  encode back; Jason 1.4+ already in mix.lock) so the rebaseline
  diff is review-grade (numbers move; structure does not).

* `Justfile` — adds `just rebaseline` (runs harness + regeneration
  script) so the same two-step sequence the workflow runs is also
  available locally for operators previewing a rebaseline. The
  `just bench-collect` message is updated to point at `just rebaseline`.

* `docs/perf-contract.md` — splits § Baseline lifecycle into
  "Automated (preferred — D-4 bootstrap)" and "Manual (for local
  previews or operators without GHA access)". Both paths leave
  _status as scaffold-placeholder; flipping to active is called out
  as a separate deliberate decision in either path.

What is deliberately NOT in this PR
────────────────────────────────────

* The real baseline numbers themselves — those land in the generated
  `perf: rebaseline (standards#99)` PR after the maintainer dispatches
  the workflow.
* Flipping bench/baseline.json `_status` to "active" — maintainer
  judgement on noise/spread; may land in the same generated PR or in
  a follow-up.
* Tightening tolerance ratios — also a maintainer judgement, post-D-4.
* Schema-drift hardening in bench/compare.exs (new scenario in results
  but not baseline silently passes in active mode) — separate defensive
  D-3 follow-up, not coupled to D-4 collection.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

(NOT Closes #99: joint-close is owner-only, and D-4 baseline collection
+ the _status flip still pend under #99 even after this lands. This PR
makes D-4 executable; the generated rebaseline PR + the flip-to-active
PR jointly close it.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath marked this pull request as ready for review May 30, 2026 15:16
@hyperpolymath hyperpolymath merged commit d01c086 into main May 30, 2026
16 checks passed
@hyperpolymath hyperpolymath deleted the perf/d4-rebaseline-automation branch May 30, 2026 15:17
hyperpolymath added a commit that referenced this pull request Jun 2, 2026
…s#99) (#30)

Phase D-3 follow-up under the single-lane HCG tier-2 channel
(standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate
defensive D-3 follow-up, not coupled to D-4 collection": when
bench/baseline.json `_status` is flipped to `active`, a scenario
present in results.json but absent from baseline.json (a new harness
scenario landed without rebaseline) silently passed the gate, and a
scenario present in baseline.json but absent from results.json (the
harness dropped a scenario without rebaselining) was never even
checked. Both directions of schema drift now fail-closed in active
mode and surface as informational "scaffold (would fail: ...)" rows
in scaffold-placeholder mode so a rebaseline PR previews the
active-mode verdict before the gate is armed.

The comparator now iterates the union of scenario names across
results and baseline rather than the results map alone, and uses a
single `enforce: bool` opt to pivot between scaffold and active mode
(replaces the previous `nil` sentinel). check_regression/5 also has
a latent crash fixed in the process — when baseline values are TODO
sentinels (or any non-number), num/1 returns nil and the `or` chain
raises BadBooleanError; the inner `&&` short-circuit already returns
nil for unknowns, so the outer joins are switched from `or` to `||`
to match. Previously this was masked by scaffold mode never reaching
check_regression at all (the `nil` sentinel skipped it); the new
flow exposes that path in scaffold mode too.

docs/perf-contract.md gains a "Schema drift" section explaining the
two directions, the active vs scaffold display difference, and the
fail-closed semantic. The behaviour pivots on `_status` in
bench/baseline.json — no code change is needed to arm the schema
checks once Phase D-4 maintainer-only rebaseline + active flip lands.

Smoke-tested locally against synthetic results/baseline fixtures
(four cases: active+drift→regressed, scaffold+drift→ok-with-warnings,
active+clean→ok, active+TODO-sentinels→ok-no-crash). Build is not
verified end-to-end — the session environment has Elixir 1.14 only,
no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports
the file is already formatted and Code.string_to_quoted!/1 round-trips
under 1.14. Repo CI (`Perf Regression`, governance, hypatia-scan,
dogfood-gate, codeql, scorecard) is the verification gate.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only; D-4 baseline collection
plus the `_status` flip to active still pend under #99 after this
lands. Same posture as PRs #14, #22, #26.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant