From 6e047bb98939caacc1efc9be5831abfa2db7c605 Mon Sep 17 00:00:00 2001 From: Laith Al-Saadoon Date: Tue, 30 Jun 2026 04:52:11 +0000 Subject: [PATCH 1/3] =?UTF-8?q?docs:=20spec=20011=20+=20ADR=200020=20?= =?UTF-8?q?=E2=80=94=20codehub=20replay=20(decision-equivalence)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drafts the structural half of Move 6 for review (no code yet). Spec 011 (.erpaval/specs/011-replay-decision-equivalence/spec.md): `codehub replay` asserts decision-equivalence — same inputs ⇒ same retrieval decision set (same files + byte ranges selected under the same budget) — via a `decisionHash` that projects ast-chunks + context-bom byteRanges and excludes incidental fields (tokenCount, pins, chunk text, fileHash). Byte-identity becomes the cheap sufficient witness, not the contract. Supersedes the byte-identity comparator in the unmerged e6a81c2 replay, reusing its integrity/recompute tiers. 5 open questions. ADR 0020: decision-equivalence is the contract of record; the existing graphHash/packHash byte-identity gates stay as the witness fast path (no gate relaxed here). Corrects the embedder-swap framing — embeddings aren't in the pack and graphHash is embedder-neutral; the swap hits the index, not packHash/graphHash. Pairs with the Move 2 variance probe as the data-backed "how well does OCH do" story. --- .../011-replay-decision-equivalence/spec.md | 126 +++++++++++++++++ ...on-equivalence-supersedes-byte-identity.md | 131 ++++++++++++++++++ 2 files changed, 257 insertions(+) create mode 100644 .erpaval/specs/011-replay-decision-equivalence/spec.md create mode 100644 docs/adr/0020-decision-equivalence-supersedes-byte-identity.md diff --git a/.erpaval/specs/011-replay-decision-equivalence/spec.md b/.erpaval/specs/011-replay-decision-equivalence/spec.md new file mode 100644 index 0000000..58abf41 --- /dev/null +++ b/.erpaval/specs/011-replay-decision-equivalence/spec.md @@ -0,0 +1,126 @@ +# Spec 011 — `codehub replay`: assert decision-equivalence structurally + +**Status:** Draft for review (NO code yet — review the contract pivot first). +**Author:** Bonk + Laith · **Date:** 2026-06-30 +**Branch:** `spec/011-replay-decision-equivalence` (off `main` @ `278702a`) +**Roadmap origin:** M-W-F run 2026-06-29, Move 6 ruling (decision-equivalence). This is the *structural* half; spec 010 / `pack --variance-probe` is the *behavioral* half (shipped, PR #269). +**Companion ADR:** 0020 — "decision-equivalence supersedes byte-identity as the pack contract" (drafted alongside this spec; 0020 is the next free ADR number, confirmed). + +--- + +## 0. The two halves of Move 6, and why this one is the keystone + +Laith ruled (2026-06-30): the pack contract pivots from **byte-identity** to **decision-equivalence** — "same inputs ⇒ provably the same *retrieval decision set* (same files + byte ranges selected under the same budget); byte-identity is one cheap witness, not the contract." + +- **Move 2 (`pack --variance-probe`, shipped)** measures the contract *behaviorally*: does an agent's answer wander less with the pack? A number, but an *empirical* one — it runs a stochastic agent and observes outcomes. +- **Move 6 structural half (`codehub replay`, this spec)** asserts the contract *structurally*: given the same inputs, did OCH select the **same decision set**? No agent, no stochasticity — a deterministic structural check. + +Laith's framing: this is "critical for data-backed results on how well OCH does." The variance probe says *the pack helps*; `replay` says *the pack is what we claim it is* — reproducible at the decision level, even when the bytes legitimately drift. Behavioral benefit (Move 2) on top of a structural guarantee (this) is the proof story. + +## 1. Diagnosis — why byte-identity is the wrong *contract* (it's the right *witness*) + +The standing invariant is **ROADMAP constraint U1: "graphHash byte-identity per commit"** (`.erpaval/ROADMAP.md:219` names it the one breaking-change budget OCH must preserve; `:201-202` lists U1/U2 as CI gates). It is asserted across six ADRs — the canonical statement is **ADR 0011** (`graphHash` = SHA-256 of the canonical-JSON `{edges, nodes}` projection, store-agnostic), and **ADR 0019** titles a section "graphHash byte-identity (the go/no-go)". The pack inherits it: `packHash` (`manifest.ts:52`) is `sha256(canonicalJson(manifest))`, and `pack-determinism.test.ts` asserts two runs produce byte-identical BOM files. The user-facing promise is `readme.ts:73`: *"same `(commit, tokenizer_id, budget_tokens, chonkie_version, grammar_commits)` produces a byte-identical pack and the same `pack_hash`."* + +Byte-identity is a fine *witness* but a brittle *contract*, because bytes are a poor proxy for the decision the auditor actually cares about: + +- **The `packHash` preimage binds incidental fields.** It includes `pins.chonkieVersion`, `pins.grammarCommits`, and each BOM file's `fileHash` (`manifest.ts:82-101`). A chonkie bump, a grammar-pin refresh, or a `tokenCount` recompute flips `packHash` — even when *the exact same byte ranges of the exact same files were selected under the same budget*. The promise in `readme.ts:73` literally lists `chonkie_version` and `grammar_commits` as inputs, conceding that a toolchain bump is a "different" pack. Under decision-equivalence it is the *same* pack. +- **The embedder-swap precedent — stated precisely.** The #252 swap (gte-modernbert → F2LLM-v2-80M, 320-dim) is the canonical "decision-irrelevant change." Precision matters here, because spec 010 §0 over-stated it: embeddings are **not** in the pack (the Parquet sidecar was dropped in ADR 0019; the BOM is **8 items**), and `graphHash` is embedder-neutral by construction (ADR 0014: it hashes only `{nodes, edges}`, never `store_meta`). So the swap breaks **neither** `packHash` **nor** `graphHash` today — it invalidates the `embeddings` table and the `store_meta` embedder fingerprint, forcing a re-index. The lesson is the general one: a legitimate change to *how* OCH builds the index (a better embedder, a newer grammar, a re-tokenizer) is exactly the kind of change a naive "did the bytes change?" auditor misreads as "the pack changed," when the retrieval decision — which files/ranges the agent saw — is identical. +- **An auditor doesn't care about bytes.** They care whether the agent's context came from the right places. Byte-identity over-promises (claims more than the contract needs) and under-delivers (breaks on changes the contract should tolerate). + +The fix: make the **decision set** the contract of record, and keep byte-identity as a *sufficient-but-not-necessary witness* of it. + +## 1.5. There is already a byte-identity `replay` — this supersedes its comparator + +A `codehub replay ` + `pack --prove` implementation already exists on the unmerged branch `feat/v1-distribution-breadth` (`e6a81c2`, not an ancestor of `main`). It is the **byte-identity predecessor** this spec supersedes. Its design (worth reusing): + +- `runReplay(hash)` reads `/.codehub/packs//manifest.json`, parses snake_case→camelCase. +- **Integrity tier** (always, offline): re-hash every BOM body on disk vs its attested `fileHash`; mismatch → hard drift. +- **Recompute tier:** re-derive `packHash` via `buildManifest`, assert equality. +- **Optional re-pack tier:** an injected `RepackDriver` checks out the commit, re-runs the packer, **byte-compares** `packHash`. `best_effort` (Claude tokenizer) tolerates re-pack drift; `strict`/`degraded` hard-fail on any byte difference. +- Verdict via `replayVerdict(r) → { line, exitCode }`. + +**What spec 011 changes:** the re-pack tier's comparator flips from *byte-identity* to *decision-equivalence*. A re-pack that drifts in bytes but selects the same decision set → `EQUIVALENT` (today: a `strict`-class drift would hard-fail). A re-pack that changes the decision set → `DIVERGED` (fail). The integrity + recompute tiers stay as the cheap byte-witness fast path. + +**Reuse + cleanup:** lift `parseManifest`, the tiered verdict, the `RepackDriver` seam, and `recomputePackHash` from `e6a81c2`. Its `parseManifest` still reads a `duckdb_version` pin and `schemaVersion: 1` — both stale post-ADR-0019 (current schema is `2`, no duckdb pin); drop them on the rebase. (Also: the `code-pack` CLI description and the ROADMAP still say "9-item BOM" — stale since ADR 0019; clean up to "8-item" in passing.) + +## 2. What the decision set IS (grounded in the current pack) + +The pack already encodes the decision set — this spec *projects* existing artifacts, it invents no new shape: + +- **`ast-chunks.jsonl`** — each row is an `AstChunk` (`ast-chunker.ts:68`): `{ path, startByte, endByte, tokenCount, language? }`, sorted `(path ASC, startByte ASC, endByte ASC)`. The `(path, startByte, endByte)` triple is *literally* "which file, which byte range, was selected under budget." +- **`context-bom.json`** — each `file` component (`context-bom.ts`) carries path, content hash, and an optional `opencodehub:byteRanges` property: merged, sorted, non-overlapping `[start, end)` spans (`mergeSpans`, `context-bom.ts:170`) — "the union of bytes read from this file." This is already a deterministic, byte-range projection independent of chunk text. + +> **The decision set** of a pack is the set of `(path, mergedByteRanges)` selections, taken under a given `budgetTokens`. Two packs are **decision-equivalent** iff their decision sets are equal — same paths, same merged byte ranges per path, same budget — regardless of `tokenCount`, `pins`, chunk text bytes, or serialization. + +Note (`ast-chunker.ts:30`): `startByte`/`endByte` are currently UTF-16 code-unit offsets stored under byte names (coincide with UTF-8 for ASCII). The comparator treats them as opaque offsets — equivalence is well-defined as long as both packs use the same convention, which they do. See Q1 for the optional line-granularity mode. + +## 3. The `decisionHash` — a normalized projection + +`replay` introduces a **`decisionHash`**: a hash over a canonical, incidental-free projection of the decision set. + +``` +decisionSet(pack) = + { + budgetTokens, # the budget the selection was made under + selections: [ # sorted by path + { path, ranges: mergedByteRanges(path) } # ranges = sorted non-overlapping [start,end) + ] + } +decisionHash = sha256(canonicalJson(decisionSet)) # same RFC 8785 helper as packHash +``` + +Deliberately **excluded** (the whole point): `tokenCount` per chunk; `pins.chonkieVersion` / `pins.grammarCommits`; chunk *text bytes* and per-file `fileHash`; `commit` / `repoOriginUrl` (provenance — reported, not hashed). + +Deliberately **included**: `path` + merged byte ranges (the selection); `budgetTokens` (the constraint — different budgets are *expected* to differ; reported distinctly, not as a violation). + +**Source of ranges.** Prefer `ast-chunks.jsonl` `(startByte,endByte)` merged per path; fall back to `context-bom.json` `byteRanges` when ast-chunks is absent/degraded. They should agree; `replay` flags when they don't (a real internal-consistency bug signal). + +**Relationship to `packHash`:** `packHash` equality ⇒ `decisionHash` equality (cheap witness — matching bytes trivially match the decision). `decisionHash` equality does NOT require `packHash` equality (the contract tolerates incidental drift). Fast path: if `packHash` matches, PASS without computing the projection; else compute and compare `decisionHash`. + +## 4. The `codehub replay` command + +Two modes — extend the existing `replay ` self-check, add a two-pack compare: + +``` +codehub replay [--repo ] [--repack] [--json] [--budget-strict] # self-check (extends e6a81c2) +codehub replay --compare [--json] [--budget-strict] # two-pack compare (new) +``` + +- **Self-check `replay `** — reads `/.codehub/packs//`. Integrity + recompute tiers (byte witness) stay. With `--repack`, re-pack the recorded `commit` and assert **decision-equivalence** (not byte-identity) against the stored pack. This is the structural analog of `codehub status`'s staleness record: "is this pack still the decision OCH would make today?" +- **Two-pack `--compare A B`** — read two pack dirs, project each to its decision set, compare. The minimal unit that proves the projection; no store, no re-pack. +- **Verdict** — `EQUIVALENT` (decision sets match) · `DIVERGED` (selections differ) · `BUDGET_MISMATCH` (different `budgetTokens` — reported distinctly; a violation only under `--budget-strict`). +- **On `DIVERGED`** — structured diff: paths only in A, paths only in B, and per-path range deltas (ranges added/removed). This is the actionable output — *what the agent would have seen differently*. +- **Exit code** — 0 on `EQUIVALENT`; non-zero on `DIVERGED` (and on `BUDGET_MISMATCH` only under `--budget-strict`). Usable as an on-demand structural gate. +- **`--json`** — full record (verdict + both `decisionHash`es + `packHash`es + diff) on stdout; human summary on stderr (context-bom discipline). The record is a pure function of the inputs — no clock/run-id — so it serializes reproducibly. + +## 5. Where it lives + shape + +- **`@opencodehub/pack`** gains the projection: a pure `buildDecisionSet(astChunks, contextBom) → DecisionSet` + `decisionHash(DecisionSet) → string`, exported beside the existing builders. It belongs in `pack` because it reuses the same determinism machinery (`canonicalJson`, the BOM shapes) and `replay` is a *reader* of pack artifacts. +- **CLI** `codehub replay` in `packages/cli/src/commands/replay.ts` (rebased from `e6a81c2`, comparator swapped), registered in `index.ts` next to `code-pack` (commander pattern; lazy `await import`). +- **Determinism of the projection itself:** `decisionSet` is a pure function of the input artifacts (no clock, no env), serialized through the same RFC 8785 `canonicalJson`. Two `replay` runs over the same packs print byte-identical records. + +## 6. EARS requirements (draft — for review) + +- **R1** WHEN given two packs built from the same `(commit, budget, tokenizer)`, `replay` SHALL compute each pack's `decisionHash` (a hash over the normalized `(path, mergedByteRanges, budgetTokens)` projection) and report `EQUIVALENT` iff they match. +- **R2** The `decisionHash` projection SHALL exclude `tokenCount`, `pins` (chonkie version, grammar commits), chunk text bytes, and per-file `fileHash`, so a toolchain-version bump that does not change the selection set yields the same `decisionHash`. +- **R3** WHERE `packHash` of the two packs is equal, `replay` SHALL short-circuit to `EQUIVALENT` without recomputing the projection (byte-identity is a sufficient witness). +- **R4** WHEN the decision sets differ, `replay` SHALL emit a structured diff naming paths present in only one pack and, for shared paths, the byte-range deltas — and SHALL exit non-zero. +- **R5** WHEN the two packs were built under different `budgetTokens`, `replay` SHALL report `BUDGET_MISMATCH` distinctly from `DIVERGED`, exiting zero by default and non-zero only under `--budget-strict`. +- **R6** The emitted `--json` record SHALL be a pure function of the inputs (no wall-clock/run-id), so the record serialization is reproducible. +- **R7** `replay` SHALL derive ranges from `ast-chunks.jsonl` when present and fall back to `context-bom.json` `byteRanges` otherwise, and SHALL flag when the two disagree for the same pack. +- **R8** The integrity + recompute tiers inherited from the `e6a81c2` `replay` (re-hash BOM bodies vs attested `fileHash`; recompute `packHash`) SHALL remain as the cheap byte-witness fast path; only the re-pack-equivalence comparator changes from byte-identity to decision-equivalence. + +## 7. Open questions for Laith (review before I build) + +1. **Byte ranges vs. line ranges.** ast-chunks records `(startByte, endByte)` as UTF-16 code-unit offsets today (`ast-chunker.ts:30`; coincide with UTF-8 for ASCII). Byte ranges are the precise contract; should I also offer a `--coarse` mode projecting to `(startLine, endLine)` for an encoding-robust, human-diffable view? I lean: byte ranges as the contract, line ranges as a reporting aid. +2. **On-demand vs. CI gate.** `replay` is deterministic and cheap (pure read + hash), unlike the variance probe. I lean: ship on-demand in v1; later add an opt-in `analyze`-time "this commit's pack is decision-equivalent to the last" assertion — but only after we've seen real diffs in practice. Don't gate CI on it until the projection is trusted. +3. **Does ADR 0020 retire the byte-identity gates, or layer over them?** I lean **layer, don't retire**: keep the `graphHash`/`packHash` byte-identity tests as the strict-witness fast path (cheap, valuable), and make decision-equivalence the *contract of record* that byte-identity is one way to satisfy. ADR 0020 reframes byte-identity from "the contract" (ROADMAP U1) to "a sufficient witness." Agree — or do you want the byte gates actually *relaxed* (e.g. let a pins-only delta pass the determinism gate)? +4. **v1 scope.** Two-pack `--compare A B` is the minimum that proves the projection and reuses no store/analyze. The `replay --repack` self-check needs a `RepackDriver` (checkout + re-pack). Ship two-pack compare + the inherited integrity/recompute tiers in v1, and `--repack` decision-equivalence in v2? (I lean yes.) +5. **Supersede or extend `e6a81c2`?** That branch's byte-identity `replay` is unmerged and 32 behind `main`. I lean: cherry-pick its scaffolding onto a fresh branch, drop the stale `duckdb_version`/`schemaVersion:1`, and land the decision-equivalence comparator as the same PR — so we don't carry two `replay`s. Agree? + +## 8. What this is NOT (scope guard) + +- Not the variance probe (spec 010 / Move 2 — behavioral half, shipped). +- Not a re-implementation of packing — `replay` is a pure *reader* of pack directories (plus an optional re-pack tier in v2). +- Not a CI gate in v1 — on-demand structural check (Q2). +- Not a graph-diff tool — it compares *pack decision sets*, not raw graphs (`detect_changes` already maps diffs to symbols — a different question). diff --git a/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md b/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md new file mode 100644 index 0000000..5d557be --- /dev/null +++ b/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md @@ -0,0 +1,131 @@ +# ADR 0020 — Decision-equivalence is the pack contract; byte-identity is a witness, not the contract + +- Status: **Proposed** — 2026-06-30 (awaiting Laith's review; pairs with spec 011). +- Authors: Laith Al-Saadoon + Bonk. +- Branch: `spec/011-replay-decision-equivalence`. +- Amends (does not supersede): the byte-identity invariant asserted in + [ADR 0011 — Graph database backend](./0011-graph-db-backend.md) (the `graphHash` + invariant) and [ADR 0019 — Single-file SQLite storage](./0019-single-file-sqlite-storage.md) + (the "graphHash byte-identity (the go/no-go)" gate), and the ROADMAP U1/U2 + determinism constraints. Those gates **stay** — this ADR reframes what they + are *for*. It also supersedes the byte-identity comparator in the unmerged + `codehub replay` (`feat/v1-distribution-breadth`, `e6a81c2`). + +## Context + +The pack's reproducibility promise has been **byte-identity**: same inputs ⇒ +byte-identical artifact, witnessed by a hash. The chain: + +- **ROADMAP U1/U2** name "graphHash byte-identity per commit" and "deterministic + code-pack (same commit + tokenizer + budget → same bytes)" as the one + breaking-change budget OCH must preserve (`.erpaval/ROADMAP.md:201-202,219`). +- **ADR 0011** defines `graphHash` as the SHA-256 of the canonical-JSON + `{edges, nodes}` projection and gates it in CI; **ADR 0019** makes + byte-identical rebuild the migration go/no-go. +- The pack inherits it: `packHash = sha256(canonicalJson(manifest))` + (`packages/pack/src/manifest.ts:52`), and `pack-determinism.test.ts` asserts + two runs produce byte-identical BOM files. +- The user-facing promise (`packages/pack/src/readme.ts:73`): *"same + `(commit, tokenizer_id, budget_tokens, chonkie_version, grammar_commits)` + produces a byte-identical pack and the same `pack_hash`."* + +Byte-identity is a good *witness* but the wrong *contract*, because the bytes +bind things the auditor does not care about: + +1. **The `packHash` preimage includes incidental fields.** `pins.chonkieVersion`, + `pins.grammarCommits`, and every BOM file's `fileHash` are in the hash + (`manifest.ts:82-101`). A chonkie bump, a grammar-pin refresh, or a + `tokenCount` recompute flips `packHash` — even when the same byte ranges of + the same files were selected under the same budget. `readme.ts:73` literally + lists `chonkie_version` and `grammar_commits` as pack inputs, conceding that + a toolchain bump yields a "different" pack. The retrieval decision was + identical. + +2. **The embedder-swap precedent, stated precisely.** The #252 embedder swap + (gte-modernbert → F2LLM-v2-80M, 320-dim) is the canonical decision-irrelevant + change. Precision matters because the motivating prose (spec 010 §0) + over-stated the mechanism: embeddings are **not** in the pack — the Parquet + sidecar was dropped in ADR 0019, the BOM is **8 items**, and `graphHash` is + embedder-neutral by construction (ADR 0014: it hashes only `{nodes, edges}`, + never `store_meta`). So the swap breaks **neither** `packHash` **nor** + `graphHash` today; it invalidates the `embeddings` table and the `store_meta` + embedder fingerprint, forcing a re-index. The general lesson holds regardless: + a legitimate change to *how* OCH builds the index — a better embedder, a newer + grammar, a re-tokenizer — is exactly what a naive "did the bytes change?" + check misreads as "the pack changed," when which files/ranges the agent saw is + identical. + +3. **An auditor cares about the decision, not the bytes.** They want: did the + agent's context come from the right places? Byte-identity over-promises + (asserts more than the contract needs) and under-delivers (breaks on changes + the contract should tolerate). + +## Decision + +**The pack contract is decision-equivalence. Byte-identity is one sufficient +witness of it, not the contract itself.** + +- **Contract of record (decision-equivalence):** two packs built from the same + inputs are equivalent iff they have the same **decision set** — the same + `(path, mergedByteRanges)` selections under the same `budgetTokens` — + regardless of `tokenCount`, `pins`, chunk text bytes, or serialization. +- **Witness (byte-identity):** `packHash` equality ⇒ decision-equivalence + (matching bytes trivially match the decision). The existing `graphHash` / + `packHash` byte-identity gates **stay** as the cheap fast-path witness — they + are valuable and almost-free. They are reframed from "the contract" to "a + sufficient condition for satisfying the contract." +- **The decision set is a projection of existing artifacts**, not a new shape. + It is computed from `ast-chunks.jsonl` (`{path, startByte, endByte}` per chunk, + `ast-chunker.ts:68`) with `context-bom.json`'s merged `byteRanges` + (`context-bom.ts:170`) as the fallback/cross-check. +- **`decisionHash`** is `sha256(canonicalJson(decisionSet))`, using the same + RFC 8785 `canonicalJson` helper as `packHash`. It deliberately **excludes** + `tokenCount`, `pins`, chunk text bytes, and per-file `fileHash`; it + **includes** `path`, merged byte ranges, and `budgetTokens`. +- **`codehub replay`** is the structural assertion tool (spec 011): it compares + two packs' decision sets (or re-packs and compares against a stored pack), + reporting `EQUIVALENT` / `DIVERGED` / `BUDGET_MISMATCH` with a structured diff. + It supersedes the byte-identity comparator in the unmerged `e6a81c2` `replay`, + reusing that branch's integrity + recompute tiers as the byte-witness fast + path and swapping only the re-pack comparator. +- **No gate is relaxed in this ADR.** The byte-identity CI gates continue to run + unchanged. Decision-equivalence is *added* as the contract they serve. Whether + to later let a pins-only delta pass the determinism gate (treating it as + decision-equivalent) is an explicit follow-up, not decided here (spec 011 Q3). + +## Consequences + +**Positive.** + +- The reproducibility claim becomes one OCH can defend against legitimate + toolchain evolution: "upgrade the chunker, swap the embedder, bump a grammar — + the pack's *decision* is provably unchanged," with `codehub replay` as the + receipt. This is the data-backed "how well does OCH do" story paired with the + Move 2 variance probe. +- The contract stops over-promising. A grammar-pin bump no longer counts as "the + pack changed" to an auditor reading a hash. +- `replay`'s diff output is actionable in a way a hash inequality never was: it + names *which files/ranges the agent would have seen differently*. + +**Negative / costs.** + +- A second hash (`decisionHash`) and a projection to maintain alongside + `packHash`. Mitigated: the projection is pure and small, lives in + `@opencodehub/pack` beside the builders, and reuses `canonicalJson`. +- Two notions of "same pack" (byte-identical vs decision-equivalent) is a concept + an operator must learn. Mitigated: `packHash` stays the default identity in + paths/UX; `decisionHash` surfaces only through `replay`. +- The `ast-chunks` offsets are UTF-16 code-unit indices today + (`ast-chunker.ts:30`), not true UTF-8 byte offsets (coincide for ASCII). + Decision-equivalence is well-defined as long as both packs use the same + convention (they do); a future promotion to true byte offsets is a + cross-cutting change tracked separately. + +**Follow-ups (not decided here).** + +- Whether to relax the byte-identity CI gates to accept decision-equivalent + packs (e.g. a pins-only delta) — spec 011 Q3. +- Whether `replay` becomes an `analyze`-time or CI assertion vs. staying + on-demand — spec 011 Q2. +- Doc-drift cleanup: the ROADMAP and `code-pack` CLI description still say + "9-item BOM"; it has been 8 since ADR 0019. From ed6ee27d46bcdf295a657ec63e9442a21886b764 Mon Sep 17 00:00:00 2001 From: Laith Al-Saadoon Date: Tue, 30 Jun 2026 12:47:20 +0000 Subject: [PATCH 2/3] =?UTF-8?q?feat(pack):=20codehub=20replay=20=E2=80=94?= =?UTF-8?q?=20decision-equivalence=20structural=20check=20(Move=206)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implements spec 011 / ADR 0020 (the structural half of Move 6). `codehub replay --compare ` asserts two packs are decision- equivalent: same files + byte ranges selected under the same budget, regardless of incidental drift (tokenCount, pins, chunk text bytes, fileHash). Byte-identity (packHash) stays the cheap sufficient witness; a decisionHash projection is the contract of record. @opencodehub/pack — new decision-set module: - decisionSetFromChunks / decisionSetFromByteRanges: project ast-chunks (path,startByte,endByte) or context-bom byteRanges to a normalized, incidental-free (path, mergedByteRanges, budget) set. - decisionHash = sha256(canonicalJson(decisionSet)) — same RFC 8785 machinery as packHash; tokenCount-only drift is decision-equivalent. - diffDecisionSets: structured diff (onlyInA / onlyInB / rangeDeltas) for the actionable DIVERGED output. CLI — codehub replay --compare A B [--json] [--budget-strict]: - Tiers (R8): integrity (re-hash BOM bodies vs attested fileHash) → packHash fast path (R3) → decision-equivalence projection. - Verdict: EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT, with exit codes; --budget-strict promotes BUDGET_MISMATCH to failure. - Manifest parser corrected for schema 2 (ADR 0019): no duckdb_version pin, reads budget_tokens. Reuses the byte-witness tier design from the unmerged e6a81c2 replay, swapping the comparator to decision-set. - --json record is a pure function of the inputs (no clock/run-id, R6). omnigent-style self-check (replay --repack) deferred to v2 per the approved spec; two-pack compare is the v1 unit that proves the projection. Spec 011 + ADR 0020 carried on this branch. +29 tests (14 pack, 15 CLI). --- packages/cli/src/commands/replay.test.ts | 254 ++++++++++++++++ packages/cli/src/commands/replay.ts | 361 +++++++++++++++++++++++ packages/cli/src/index.ts | 36 +++ packages/pack/src/decision-set.test.ts | 128 ++++++++ packages/pack/src/decision-set.ts | 183 ++++++++++++ packages/pack/src/index.ts | 11 + 6 files changed, 973 insertions(+) create mode 100644 packages/cli/src/commands/replay.test.ts create mode 100644 packages/cli/src/commands/replay.ts create mode 100644 packages/pack/src/decision-set.test.ts create mode 100644 packages/pack/src/decision-set.ts diff --git a/packages/cli/src/commands/replay.test.ts b/packages/cli/src/commands/replay.test.ts new file mode 100644 index 0000000..d373f8e --- /dev/null +++ b/packages/cli/src/commands/replay.test.ts @@ -0,0 +1,254 @@ +/** + * Tests for `codehub replay --compare` (decision-equivalence). + * + * Strategy: the comparator (`runReplayCompare`) is exercised via the + * `_loadPack` seam with hand-built `LoadedPack`s — no filesystem. `loadPack` + * itself is tested against a real on-disk pack directory (manifest + + * ast-chunks + context-bom) so the snake_case parsing + integrity tier + the + * JSONL/CycloneDX parsers are covered end-to-end. + */ + +import { strict as assert } from "node:assert"; +import { createHash } from "node:crypto"; +import { mkdtemp, rm, writeFile } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { after, before, describe, it } from "node:test"; +import { + type LoadedPack, + loadPack, + packDecisionSet, + replayVerdictLine, + runReplayCompare, + serializeReplayRecord, +} from "./replay.js"; + +const sha = (s: string) => createHash("sha256").update(s).digest("hex"); + +/** Build a LoadedPack with given chunks; manifest packHash defaults distinct. */ +function pack(over: Partial & { packHash: string; budget: number }): LoadedPack { + return { + dir: `/fake/${over.packHash}`, + manifest: { + packHash: over.packHash, + budgetTokens: over.budget, + commit: "c0ffee", + files: [], + }, + chunks: over.chunks ?? [], + byteRangesByPath: over.byteRangesByPath ?? new Map(), + integrityDrift: over.integrityDrift ?? [], + }; +} + +const chunk = (path: string, startByte: number, endByte: number) => ({ path, startByte, endByte }); + +describe("runReplayCompare (seamed)", () => { + async function compare(a: LoadedPack, b: LoadedPack) { + const byDir = new Map([ + [a.dir, a], + [b.dir, b], + ]); + return runReplayCompare(a.dir, b.dir, { + _loadPack: async (dir) => { + const p = byDir.get(dir); + if (p === undefined) throw new Error(`no fake pack at ${dir}`); + return p; + }, + }); + } + + it("EQUIVALENT via packHash fast path when hashes match (no projection needed)", async () => { + const a = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 99)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT"); + assert.equal(r.decisionHashA, undefined, "fast path skips the projection"); + }); + + it("EQUIVALENT when packHashes differ but the decision set matches (the contract)", async () => { + // Same selection, different incidental bytes → different packHash, same decision. + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT"); + assert.equal(r.decisionHashA, r.decisionHashB, "decision hashes match"); + }); + + it("DIVERGED with a structured diff when selections differ", async () => { + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 20)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "DIVERGED"); + assert.ok(r.diff !== undefined); + assert.equal(r.diff?.rangeDeltas[0]?.path, "a.ts"); + assert.notEqual(r.decisionHashA, r.decisionHashB); + }); + + it("BUDGET_MISMATCH when budgets differ (reported distinctly, not DIVERGED)", async () => { + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 200, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "BUDGET_MISMATCH"); + assert.equal(r.budgetA, 100); + assert.equal(r.budgetB, 200); + }); + + it("CORRUPT when either pack has integrity drift (refuses to compare)", async () => { + const a = pack({ packHash: "hashA", budget: 100, integrityDrift: ["ast-chunks.jsonl"] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "CORRUPT"); + assert.deepEqual(r.corruptItems, ["ast-chunks.jsonl"]); + }); + + it("falls back to context-bom byteRanges when ast-chunks is empty (R7)", async () => { + const a = pack({ + packHash: "hashA", + budget: 100, + byteRangesByPath: new Map([["a.ts", [{ start: 0, end: 10 }]]]), + }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT", "byteRanges fallback == equivalent chunks"); + }); +}); + +describe("replayVerdictLine exit codes", () => { + const base = { packHashA: "a", packHashB: "b", budgetA: 100, budgetB: 100 } as const; + + it("EQUIVALENT → exit 0", () => { + assert.equal(replayVerdictLine({ verdict: "EQUIVALENT", ...base }, false).exitCode, 0); + }); + it("DIVERGED → exit 1", () => { + assert.equal(replayVerdictLine({ verdict: "DIVERGED", ...base }, false).exitCode, 1); + }); + it("CORRUPT → exit 1", () => { + assert.equal( + replayVerdictLine({ verdict: "CORRUPT", ...base, corruptItems: ["x"] }, false).exitCode, + 1, + ); + }); + it("BUDGET_MISMATCH → exit 0 by default, 1 under --budget-strict", () => { + const r = { verdict: "BUDGET_MISMATCH", ...base, budgetB: 200 } as const; + assert.equal(replayVerdictLine(r, false).exitCode, 0); + assert.equal(replayVerdictLine(r, true).exitCode, 1); + }); +}); + +describe("serializeReplayRecord (R6 determinism)", () => { + it("is byte-identical across calls and carries no clock/run-id", () => { + const r = { + verdict: "DIVERGED" as const, + packHashA: "a", + packHashB: "b", + decisionHashA: "da", + decisionHashB: "db", + budgetA: 100, + budgetB: 100, + diff: { equivalent: false, onlyInA: ["x.ts"], onlyInB: [], rangeDeltas: [] }, + }; + const j1 = serializeReplayRecord(r); + const j2 = serializeReplayRecord(r); + assert.equal(j1, j2); + assert.ok(!j1.includes("timestamp") && !j1.includes("Date")); + }); +}); + +describe("packDecisionSet (projection precedence)", () => { + it("prefers ast-chunks over context-bom byteRanges", () => { + const p = pack({ + packHash: "h", + budget: 100, + chunks: [chunk("a.ts", 0, 10)], + byteRangesByPath: new Map([["zzz.ts", [{ start: 0, end: 999 }]]]), + }); + const set = packDecisionSet(p); + assert.deepEqual( + set.selections.map((s) => s.path), + ["a.ts"], + "ast-chunks wins; context-bom ignored when chunks present", + ); + }); +}); + +describe("loadPack (real on-disk)", () => { + let dir: string; + before(async () => { + dir = await mkdtemp(join(tmpdir(), "och-replay-pack-")); + // ast-chunks.jsonl — one canonical-JSON AstChunk per line. + const astChunks = [ + JSON.stringify({ path: "a.ts", startByte: 0, endByte: 10, tokenCount: 3 }), + JSON.stringify({ path: "a.ts", startByte: 10, endByte: 20, tokenCount: 2 }), + "", + ].join("\n"); + await writeFile(join(dir, "ast-chunks.jsonl"), astChunks, "utf8"); + // context-bom.json — CycloneDX with an opencodehub:byteRanges property. + const contextBom = JSON.stringify({ + bomFormat: "CycloneDX", + specVersion: "1.6", + components: [ + { + type: "file", + name: "a.ts", + properties: [{ name: "opencodehub:byteRanges", value: JSON.stringify([[0, 20]]) }], + }, + ], + }); + await writeFile(join(dir, "context-bom.json"), contextBom, "utf8"); + // manifest.json — snake_case wire form. fileHashes match the bodies above. + const manifest = JSON.stringify({ + budget_tokens: 100, + commit: "c0ffee", + determinism_class: "strict", + files: [ + { kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: sha(astChunks) }, + { kind: "context-bom", path: "context-bom.json", file_hash: sha(contextBom) }, + ], + pack_hash: "deadbeef", + schema_version: 2, + }); + await writeFile(join(dir, "manifest.json"), manifest, "utf8"); + }); + after(async () => { + await rm(dir, { recursive: true, force: true }); + }); + + it("parses manifest (schema 2, no duckdb pin), ast-chunks, and context-bom ranges", async () => { + const loaded = await loadPack(dir); + assert.equal(loaded.manifest.packHash, "deadbeef"); + assert.equal(loaded.manifest.budgetTokens, 100); + assert.equal(loaded.chunks.length, 2, "two ast-chunk rows parsed (blank line skipped)"); + assert.equal(loaded.byteRangesByPath.get("a.ts")?.[0]?.end, 20); + assert.equal(loaded.integrityDrift.length, 0, "on-disk bytes match attested fileHashes"); + }); + + it("flags integrity drift when a body's bytes don't match its attested hash", async () => { + // Rewrite the manifest with a wrong fileHash for ast-chunks. + const badManifest = JSON.stringify({ + budget_tokens: 100, + commit: "c0ffee", + determinism_class: "strict", + files: [{ kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: "0".repeat(64) }], + pack_hash: "deadbeef", + schema_version: 2, + }); + const badDir = await mkdtemp(join(tmpdir(), "och-replay-bad-")); + try { + await writeFile( + join(badDir, "ast-chunks.jsonl"), + '{"path":"a.ts","startByte":0,"endByte":1}', + "utf8", + ); + await writeFile(join(badDir, "manifest.json"), badManifest, "utf8"); + const loaded = await loadPack(badDir); + assert.deepEqual(loaded.integrityDrift, ["ast-chunks.jsonl"]); + } finally { + await rm(badDir, { recursive: true, force: true }); + } + }); + + it("throws a clear error when the pack dir has no manifest", async () => { + await assert.rejects(() => loadPack(join(tmpdir(), "no-such-pack-dir")), /no pack at/); + }); +}); diff --git a/packages/cli/src/commands/replay.ts b/packages/cli/src/commands/replay.ts new file mode 100644 index 0000000..634d8a7 --- /dev/null +++ b/packages/cli/src/commands/replay.ts @@ -0,0 +1,361 @@ +/** + * `codehub replay --compare ` — assert two packs are + * decision-equivalent (spec 011 / ADR 0020). + * + * Decision-equivalence (the contract of record): two packs built from the same + * inputs are equivalent iff they select the **same decision set** — the same + * files + byte ranges, under the same budget — regardless of `tokenCount`, + * `pins`, chunk text, or serialization. Byte-identity (`packHash`) stays the + * cheap *sufficient witness*: if the two `packHash`es match, the decision + * trivially matches and we short-circuit (R3). + * + * Tiers (R8 — the cheap byte-witness layers from the prior byte-identity + * `replay` are kept, only the equivalence comparator changed to decision-set): + * 1. **Integrity** (always, offline): re-hash every BOM body on disk vs its + * attested `fileHash` in `manifest.json`. A drifted/corrupt pack is + * reported before any comparison — you can't compare a tampered pack. + * 2. **packHash fast path:** equal `packHash` ⇒ `EQUIVALENT` immediately. + * 3. **decision-equivalence:** project each pack to its decision set + * (ast-chunks preferred, context-bom `byteRanges` fallback — R7) and + * compare. Different `budgetTokens` ⇒ `BUDGET_MISMATCH` (R5). + * + * `console.log` to stdout is sanctioned in command modules (biome override); + * the JSON record goes to stdout, the human summary to stderr (the context-bom + * discipline). + */ + +import { createHash } from "node:crypto"; +import { existsSync } from "node:fs"; +import { readFile } from "node:fs/promises"; +import { join, resolve } from "node:path"; +import { canonicalJson } from "@opencodehub/core-types"; +import { + type DecisionDiff, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, +} from "@opencodehub/pack"; + +/** Minimal manifest fields `replay` reads (corrected for schema 2 — ADR 0019). */ +interface ReplayManifest { + readonly packHash: string; + readonly budgetTokens: number; + readonly commit: string; + readonly files: ReadonlyArray<{ + readonly kind: string; + readonly path: string; + readonly fileHash: string; + }>; +} + +/** A chunk row read from `ast-chunks.jsonl`. */ +interface AstChunkRow { + readonly path: string; + readonly startByte: number; + readonly endByte: number; +} + +/** Everything `replay` reads from one pack directory. */ +export interface LoadedPack { + readonly dir: string; + readonly manifest: ReplayManifest; + /** ast-chunks rows (empty when the file is absent/empty — production default). */ + readonly chunks: readonly AstChunkRow[]; + /** Per-path merged byte ranges parsed from context-bom.json. */ + readonly byteRangesByPath: ReadonlyMap>; + /** Integrity-tier drift: BOM bodies whose on-disk bytes ≠ attested fileHash. */ + readonly integrityDrift: readonly string[]; +} + +export type ReplayVerdict = "EQUIVALENT" | "DIVERGED" | "BUDGET_MISMATCH" | "CORRUPT"; + +export interface ReplayResult { + readonly verdict: ReplayVerdict; + readonly packHashA: string; + readonly packHashB: string; + /** Decision hashes — undefined when the packHash fast path settled it. */ + readonly decisionHashA?: string; + readonly decisionHashB?: string; + readonly budgetA: number; + readonly budgetB: number; + /** The structured diff, present on DIVERGED. */ + readonly diff?: DecisionDiff; + /** Integrity drift surfaced from either pack (present on CORRUPT). */ + readonly corruptItems?: readonly string[]; +} + +export interface ReplayCompareArgs { + /** Test seam — inject a pack loader so tests skip the filesystem. */ + readonly _loadPack?: (dir: string) => Promise; +} + +/** + * Compare two pack directories for decision-equivalence. Pure given the loaded + * packs; the loader (default {@link loadPack}) is the only I/O. + */ +export async function runReplayCompare( + packDirA: string, + packDirB: string, + args: ReplayCompareArgs = {}, +): Promise { + const load = args._loadPack ?? loadPack; + const a = await load(resolve(packDirA)); + const b = await load(resolve(packDirB)); + + // Tier 1: integrity. A pack whose bytes disagree with its own manifest is + // corrupt — refuse to compare it (the comparison would be meaningless). + const corrupt = [...a.integrityDrift, ...b.integrityDrift]; + if (corrupt.length > 0) { + return { + verdict: "CORRUPT", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + corruptItems: corrupt, + }; + } + + // Tier 2: packHash fast path (R3) — byte-identity is a sufficient witness. + if (a.manifest.packHash === b.manifest.packHash) { + return { + verdict: "EQUIVALENT", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + }; + } + + // Different budgets are expected to differ — report distinctly (R5), before + // the decision diff (a different budget is not a contract violation). + if (a.manifest.budgetTokens !== b.manifest.budgetTokens) { + return { + verdict: "BUDGET_MISMATCH", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + }; + } + + // Tier 3: decision-equivalence. + const setA = packDecisionSet(a); + const setB = packDecisionSet(b); + const diff = diffDecisionSets(setA, setB); + return { + verdict: diff.equivalent ? "EQUIVALENT" : "DIVERGED", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + decisionHashA: decisionHash(setA), + decisionHashB: decisionHash(setB), + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + ...(diff.equivalent ? {} : { diff }), + }; +} + +/** + * Project a loaded pack to its decision set: ast-chunks preferred, context-bom + * `byteRanges` fallback (R7). Exported for tests. + */ +export function packDecisionSet(pack: LoadedPack): DecisionSet { + if (pack.chunks.length > 0) { + return decisionSetFromChunks(pack.chunks, pack.manifest.budgetTokens); + } + return decisionSetFromByteRanges(pack.byteRangesByPath, pack.manifest.budgetTokens); +} + +/** + * Load + parse a pack directory: manifest.json (snake_case → camelCase), + * ast-chunks.jsonl (JSONL), context-bom.json (CycloneDX byteRanges), and run + * the integrity tier (re-hash bodies vs attested fileHash). + */ +export async function loadPack(dir: string): Promise { + const manifestPath = join(dir, "manifest.json"); + if (!existsSync(manifestPath)) { + throw new Error( + `codehub replay: no pack at ${dir} (missing manifest.json). ` + + "Pass a .codehub/packs// directory produced by `codehub code-pack`.", + ); + } + const manifest = parseManifest(await readFile(manifestPath, "utf8")); + + // Integrity tier: re-hash each BOM body on disk vs its attested digest. + const integrityDrift: string[] = []; + for (const f of manifest.files) { + const bodyPath = join(dir, f.path); + if (!existsSync(bodyPath)) { + integrityDrift.push(f.path); + continue; + } + const recomputed = sha256HexBytes(await readFile(bodyPath)); + if (recomputed !== f.fileHash) integrityDrift.push(f.path); + } + + const chunks = await loadAstChunks(dir); + const byteRangesByPath = await loadContextBomRanges(dir); + return { dir, manifest, chunks, byteRangesByPath, integrityDrift }; +} + +/** + * Parse the on-disk snake_case manifest into the fields `replay` needs. + * Corrected for schema 2 (ADR 0019): no `duckdb_version` pin, `budget_tokens` + * is read for the decision set. + */ +function parseManifest(json: string): ReplayManifest { + const w = JSON.parse(json) as Record; + const files = (w["files"] ?? []) as Array>; + return { + packHash: String(w["pack_hash"] ?? ""), + budgetTokens: Number(w["budget_tokens"] ?? 0), + commit: String(w["commit"] ?? ""), + files: files.map((f) => ({ + kind: String(f["kind"] ?? ""), + path: String(f["path"] ?? ""), + fileHash: String(f["file_hash"] ?? ""), + })), + }; +} + +/** Read `ast-chunks.jsonl` (one canonical-JSON AstChunk per line). Absent → []. */ +async function loadAstChunks(dir: string): Promise { + const p = join(dir, "ast-chunks.jsonl"); + if (!existsSync(p)) return []; + const text = await readFile(p, "utf8"); + const rows: AstChunkRow[] = []; + for (const line of text.split("\n")) { + const trimmed = line.trim(); + if (trimmed.length === 0) continue; + const row = JSON.parse(trimmed) as Record; + rows.push({ + path: String(row["path"] ?? ""), + startByte: Number(row["startByte"] ?? 0), + endByte: Number(row["endByte"] ?? 0), + }); + } + return rows; +} + +/** + * Read `context-bom.json` and extract per-path byte ranges from the + * `opencodehub:byteRanges` property (a JSON-stringified `[[start,end],...]`). + * Absent → empty map. + */ +async function loadContextBomRanges( + dir: string, +): Promise>> { + const p = join(dir, "context-bom.json"); + const out = new Map>(); + if (!existsSync(p)) return out; + const doc = JSON.parse(await readFile(p, "utf8")) as { + components?: ReadonlyArray<{ + name?: unknown; + properties?: ReadonlyArray<{ name?: unknown; value?: unknown }>; + }>; + }; + for (const c of doc.components ?? []) { + const path = typeof c.name === "string" ? c.name : undefined; + if (path === undefined) continue; + const prop = (c.properties ?? []).find((x) => x.name === "opencodehub:byteRanges"); + if (prop === undefined || typeof prop.value !== "string") continue; + let pairs: unknown; + try { + pairs = JSON.parse(prop.value); + } catch { + continue; + } + if (!Array.isArray(pairs)) continue; + const ranges: { start: number; end: number }[] = []; + for (const pair of pairs) { + if (Array.isArray(pair) && pair.length === 2) { + ranges.push({ start: Number(pair[0]), end: Number(pair[1]) }); + } + } + if (ranges.length > 0) out.set(path, ranges); + } + return out; +} + +/** + * Render a {@link ReplayResult} to a one-line-plus-detail human summary and an + * exit code. Exported so the CLI action stays a thin shim and the mapping is + * unit-testable. + */ +export function replayVerdictLine( + r: ReplayResult, + budgetStrict: boolean, +): { line: string; exitCode: number } { + switch (r.verdict) { + case "EQUIVALENT": + return { line: "codehub replay: EQUIVALENT — same decision set", exitCode: 0 }; + case "BUDGET_MISMATCH": { + const line = `codehub replay: BUDGET_MISMATCH — A budget=${r.budgetA}, B budget=${r.budgetB} (decision sets not comparable under different budgets)`; + return { line, exitCode: budgetStrict ? 1 : 0 }; + } + case "CORRUPT": + return { + line: `codehub replay: CORRUPT — on-disk bytes drifted from the manifest for: ${(r.corruptItems ?? []).join(", ")}`, + exitCode: 1, + }; + case "DIVERGED": + return { line: formatDivergedSummary(r), exitCode: 1 }; + } +} + +/** Multi-line human summary of a DIVERGED verdict (the actionable diff). */ +function formatDivergedSummary(r: ReplayResult): string { + const lines: string[] = ["codehub replay: DIVERGED — the packs select different decision sets"]; + const diff = r.diff; + if (diff !== undefined) { + for (const p of diff.onlyInA) lines.push(` only in A: ${p}`); + for (const p of diff.onlyInB) lines.push(` only in B: ${p}`); + for (const d of diff.rangeDeltas) { + lines.push(` ranges differ: ${d.path} A=${fmtRanges(d.a)} B=${fmtRanges(d.b)}`); + } + } + return lines.join("\n"); +} + +function fmtRanges(ranges: ReadonlyArray): string { + return `[${ranges.map(([s, e]) => `${s}-${e}`).join(",")}]`; +} + +/** + * Print a {@link ReplayResult}. JSON → stdout (machine consumers / `--json`); + * the human summary → stderr so it never pollutes a piped stdout. + */ +export function printReplayResult(r: ReplayResult, asJson: boolean, budgetStrict: boolean): void { + const { line } = replayVerdictLine(r, budgetStrict); + if (asJson) { + console.log(serializeReplayRecord(r)); + } else { + console.warn(line); + } +} + +/** Canonical JSON of the replay record — pure function of the inputs (R6). */ +export function serializeReplayRecord(r: ReplayResult): string { + // Reuse the decision-set canonical serializer's discipline by hand-building a + // stable object; the record carries no clock/run-id, so it is reproducible. + const record: Record = { + verdict: r.verdict, + packHashA: r.packHashA, + packHashB: r.packHashB, + budgetA: r.budgetA, + budgetB: r.budgetB, + }; + if (r.decisionHashA !== undefined) record["decisionHashA"] = r.decisionHashA; + if (r.decisionHashB !== undefined) record["decisionHashB"] = r.decisionHashB; + if (r.diff !== undefined) record["diff"] = r.diff; + if (r.corruptItems !== undefined) record["corruptItems"] = r.corruptItems; + // The same RFC 8785 helper that backs packHash — sorts keys + normalizes + // numbers, so the record serializes byte-identically given the same inputs. + return canonicalJson(record); +} + +function sha256HexBytes(bytes: Uint8Array): string { + return createHash("sha256").update(bytes).digest("hex"); +} diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index 3b4d6f1..418b611 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -453,6 +453,42 @@ program } }); +program + .command("replay") + .description( + "Assert two code-packs are decision-equivalent (spec 011 / ADR 0020): same files + byte " + + "ranges selected under the same budget, regardless of incidental drift (tokenCount, pins, " + + "chunk text). packHash equality is the cheap witness; a decisionHash projection is the " + + "contract. Verdict: EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT. On-demand, never a CI gate.", + ) + .requiredOption( + "--compare ", + "Two pack directories (.codehub/packs//) to compare for decision-equivalence", + ) + .option( + "--json", + "Emit the full replay record (verdict + decisionHashes + diff) as JSON on stdout", + ) + .option( + "--budget-strict", + "Treat a BUDGET_MISMATCH (different --budget between the packs) as a failure exit", + ) + .action(async (opts: Record) => { + const mod = await import("./commands/replay.js"); + const packs = Array.isArray(opts["compare"]) ? (opts["compare"] as string[]) : []; + if (packs.length !== 2) { + throw new Error( + `codehub replay --compare expects exactly two pack directories, got ${packs.length}.`, + ); + } + const budgetStrict = opts["budgetStrict"] === true; + const [packA, packB] = packs as [string, string]; + const result = await mod.runReplayCompare(packA, packB); + mod.printReplayResult(result, opts["json"] === true, budgetStrict); + const { exitCode } = mod.replayVerdictLine(result, budgetStrict); + if (exitCode !== 0) process.exitCode = exitCode; + }); + program .command("query ") .description("Direct hybrid search against a repo's graph") diff --git a/packages/pack/src/decision-set.test.ts b/packages/pack/src/decision-set.test.ts new file mode 100644 index 0000000..47fbff1 --- /dev/null +++ b/packages/pack/src/decision-set.test.ts @@ -0,0 +1,128 @@ +import { strict as assert } from "node:assert"; +import { describe, it } from "node:test"; +import type { ByteSpan } from "./context-bom.js"; +import { + canonicalDecisionSet, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, +} from "./decision-set.js"; + +const chunk = (path: string, startByte: number, endByte: number, tokenCount = 1) => ({ + path, + startByte, + endByte, + tokenCount, +}); + +describe("decisionSetFromChunks", () => { + it("groups by path, merges adjacent/overlapping spans, sorts paths", () => { + const set = decisionSetFromChunks( + [ + chunk("b.ts", 10, 20), + chunk("a.ts", 0, 10), + chunk("a.ts", 10, 25), // adjacent to [0,10) → merges to [0,25) + chunk("b.ts", 0, 10), + ], + 100, + ); + assert.equal(set.budgetTokens, 100); + assert.deepEqual( + set.selections.map((s) => s.path), + ["a.ts", "b.ts"], + "paths sorted ASC", + ); + assert.deepEqual(set.selections[0]?.ranges, [[0, 25]], "a.ts spans merged"); + assert.deepEqual(set.selections[1]?.ranges, [[0, 20]], "b.ts spans merged"); + }); + + it("EXCLUDES tokenCount — a tokenCount-only drift is decision-equivalent", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10, 3)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10, 999)], 100); + assert.equal(decisionHash(a), decisionHash(b), "tokenCount not in the projection"); + }); + + it("drops a path whose spans are all zero-length / inverted", () => { + const set = decisionSetFromChunks([chunk("a.ts", 5, 5), chunk("a.ts", 9, 3)], 100); + assert.equal(set.selections.length, 0, "no real ranges → not a selection"); + }); +}); + +describe("decisionSetFromByteRanges (context-bom fallback)", () => { + it("produces the same decision set as the equivalent chunks", () => { + const fromChunks = decisionSetFromChunks([chunk("a.ts", 0, 10), chunk("a.ts", 10, 20)], 100); + const ranges = new Map([["a.ts", [{ start: 0, end: 20 }]]]); + const fromRanges = decisionSetFromByteRanges(ranges, 100); + assert.equal(decisionHash(fromChunks), decisionHash(fromRanges)); + }); +}); + +describe("decisionHash", () => { + it("is stable across two calls (pure)", () => { + const set = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + assert.equal(decisionHash(set), decisionHash(set)); + }); + + it("differs when the selected byte ranges differ", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 12)], 100); + assert.notEqual(decisionHash(a), decisionHash(b)); + }); + + it("differs when the budget differs (budget is part of the decision)", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10)], 200); + assert.notEqual(decisionHash(a), decisionHash(b)); + }); + + it("is independent of input chunk order (grouping is order-free)", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 5), chunk("b.ts", 0, 5)], 100); + const b = decisionSetFromChunks([chunk("b.ts", 0, 5), chunk("a.ts", 0, 5)], 100); + assert.equal(decisionHash(a), decisionHash(b)); + }); +}); + +describe("canonicalDecisionSet", () => { + it("serializes byte-identically for the same set", () => { + const set: DecisionSet = { + budgetTokens: 100, + selections: [{ path: "a.ts", ranges: [[0, 10]] }], + }; + assert.equal(canonicalDecisionSet(set), canonicalDecisionSet(set)); + }); +}); + +describe("diffDecisionSets", () => { + it("reports equivalent for identical sets", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, true); + assert.equal(diff.onlyInA.length, 0); + assert.equal(diff.onlyInB.length, 0); + assert.equal(diff.rangeDeltas.length, 0); + }); + + it("names paths present in only one set", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10), chunk("shared.ts", 0, 5)], 100); + const b = decisionSetFromChunks([chunk("b.ts", 0, 10), chunk("shared.ts", 0, 5)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, false); + assert.deepEqual(diff.onlyInA, ["a.ts"]); + assert.deepEqual(diff.onlyInB, ["b.ts"]); + assert.equal(diff.rangeDeltas.length, 0, "shared.ts ranges match"); + }); + + it("reports range deltas for a shared path whose ranges differ", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 20)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, false); + assert.equal(diff.rangeDeltas.length, 1); + assert.equal(diff.rangeDeltas[0]?.path, "a.ts"); + assert.deepEqual(diff.rangeDeltas[0]?.a, [[0, 10]]); + assert.deepEqual(diff.rangeDeltas[0]?.b, [[0, 20]]); + }); +}); diff --git a/packages/pack/src/decision-set.ts b/packages/pack/src/decision-set.ts new file mode 100644 index 0000000..ac30ee6 --- /dev/null +++ b/packages/pack/src/decision-set.ts @@ -0,0 +1,183 @@ +/** + * Decision set + `decisionHash` (spec 011 / ADR 0020). + * + * The pack's contract pivoted from byte-identity to **decision-equivalence** + * (ADR 0020): two packs built from the same inputs are equivalent iff they + * select the **same decision set** — the same files + byte ranges, under the + * same budget — regardless of `tokenCount`, `pins`, chunk text bytes, or + * serialization. Byte-identity (`packHash`) stays a cheap *sufficient witness*. + * + * This module computes the decision set as a normalized projection of the two + * pack artifacts that already encode "which file, which byte range, selected": + * - `ast-chunks.jsonl` — each row's `(path, startByte, endByte)` triple. + * - `context-bom.json` — each file component's merged `byteRanges`. + * ast-chunks is preferred; the context-bom is the fallback/cross-check. + * + * The projection deliberately EXCLUDES the incidental fields whose drift is + * decision-irrelevant: `tokenCount`, `pins` (chonkie version, grammar + * commits), chunk text, per-file `fileHash`, and provenance (`commit`). + * `decisionHash` is `sha256(canonicalJson(decisionSet))` — the same RFC 8785 + * machinery as `packHash`, so two `replay` runs over the same packs serialize + * identically. + */ + +import { canonicalJson, sha256Hex } from "@opencodehub/core-types"; +import { type ByteSpan, mergeSpans } from "./context-bom.js"; + +/** A `[start, end)` byte range, surfaced as a 2-tuple for compact hashing. */ +export type RangeTuple = readonly [start: number, end: number]; + +/** One file's selection: its path + the merged, sorted byte ranges chosen. */ +export interface Selection { + readonly path: string; + /** Sorted, non-overlapping `[start, end)` ranges (from {@link mergeSpans}). */ + readonly ranges: readonly RangeTuple[]; +} + +/** The normalized, incidental-free decision set of a pack. */ +export interface DecisionSet { + /** The budget the selection was made under — different budgets differ by design. */ + readonly budgetTokens: number; + /** Selections sorted by path ASC; each path's ranges sorted + merged. */ + readonly selections: readonly Selection[]; +} + +/** A chunk row as read from `ast-chunks.jsonl` (the {@link AstChunk} shape). */ +interface ChunkLike { + readonly path: string; + readonly startByte: number; + readonly endByte: number; +} + +/** + * Build the decision set from AST chunks. Groups chunks by path, merges each + * path's spans into sorted non-overlapping ranges, and sorts paths. Pure. + */ +export function decisionSetFromChunks( + chunks: readonly ChunkLike[], + budgetTokens: number, +): DecisionSet { + const byPath = new Map(); + for (const c of chunks) { + const spans = byPath.get(c.path); + const span: ByteSpan = { start: c.startByte, end: c.endByte }; + if (spans === undefined) byPath.set(c.path, [span]); + else spans.push(span); + } + return assembleDecisionSet(byPath, budgetTokens); +} + +/** + * Build the decision set from per-path byte spans (e.g. the context-bom's + * `byteRanges`). The fallback path when ast-chunks is absent. Pure. + */ +export function decisionSetFromByteRanges( + byteRangesByPath: ReadonlyMap, + budgetTokens: number, +): DecisionSet { + const byPath = new Map(); + for (const [path, spans] of byteRangesByPath) { + byPath.set(path, [...spans]); + } + return assembleDecisionSet(byPath, budgetTokens); +} + +/** Merge + sort the per-path spans into the canonical {@link DecisionSet}. */ +function assembleDecisionSet( + byPath: ReadonlyMap, + budgetTokens: number, +): DecisionSet { + const selections: Selection[] = []; + for (const [path, spans] of byPath) { + const merged = mergeSpans(spans); + if (merged.length === 0) continue; // a path with no real ranges is not a selection + selections.push({ + path, + ranges: merged.map((s) => [s.start, s.end] as const), + }); + } + selections.sort((a, b) => (a.path < b.path ? -1 : a.path > b.path ? 1 : 0)); + return { budgetTokens, selections }; +} + +/** + * The `decisionHash` — `sha256(canonicalJson(decisionSet))`. Same RFC 8785 + * helper as `packHash`, so it is byte-stable across processes given the same + * decision set. + */ +export function decisionHash(set: DecisionSet): string { + return sha256Hex(canonicalDecisionSet(set)); +} + +/** Canonical JSON of a decision set — exported so callers can hash/compare it. */ +export function canonicalDecisionSet(set: DecisionSet): string { + // The DecisionSet shape is already canonical (sorted selections, merged + // ranges); routing through canonicalJson sorts object keys + fixes number + // format so the bytes match packHash's discipline exactly. + return canonicalJson(set); +} + +/** The structured difference between two decision sets (the `DIVERGED` output). */ +export interface DecisionDiff { + /** True when the two sets select identically (same paths + ranges). */ + readonly equivalent: boolean; + /** Paths selected in A but not B. */ + readonly onlyInA: readonly string[]; + /** Paths selected in B but not A. */ + readonly onlyInB: readonly string[]; + /** Shared paths whose merged ranges differ, with both sides' ranges. */ + readonly rangeDeltas: readonly { + readonly path: string; + readonly a: readonly RangeTuple[]; + readonly b: readonly RangeTuple[]; + }[]; +} + +/** + * Diff two decision sets. Names paths present in only one set and, for shared + * paths, the range deltas. `equivalent` is true iff there are no path or range + * differences. Pure; the budget is compared by the caller (a budget mismatch + * is reported distinctly, not folded into this diff). + */ +export function diffDecisionSets(a: DecisionSet, b: DecisionSet): DecisionDiff { + const aByPath = new Map(a.selections.map((s) => [s.path, s.ranges])); + const bByPath = new Map(b.selections.map((s) => [s.path, s.ranges])); + + const onlyInA: string[] = []; + const onlyInB: string[] = []; + const rangeDeltas: { path: string; a: readonly RangeTuple[]; b: readonly RangeTuple[] }[] = []; + + for (const [path, aRanges] of aByPath) { + const bRanges = bByPath.get(path); + if (bRanges === undefined) { + onlyInA.push(path); + } else if (!rangesEqual(aRanges, bRanges)) { + rangeDeltas.push({ path, a: aRanges, b: bRanges }); + } + } + for (const path of bByPath.keys()) { + if (!aByPath.has(path)) onlyInB.push(path); + } + + onlyInA.sort(); + onlyInB.sort(); + rangeDeltas.sort((x, y) => (x.path < y.path ? -1 : x.path > y.path ? 1 : 0)); + + return { + equivalent: onlyInA.length === 0 && onlyInB.length === 0 && rangeDeltas.length === 0, + onlyInA, + onlyInB, + rangeDeltas, + }; +} + +function rangesEqual(a: readonly RangeTuple[], b: readonly RangeTuple[]): boolean { + if (a.length !== b.length) return false; + for (let i = 0; i < a.length; i += 1) { + const ra = a[i]; + const rb = b[i]; + if (ra === undefined || rb === undefined) return false; + if (ra[0] !== rb[0] || ra[1] !== rb[1]) return false; + } + return true; +} diff --git a/packages/pack/src/index.ts b/packages/pack/src/index.ts index c39f563..cee8c67 100644 --- a/packages/pack/src/index.ts +++ b/packages/pack/src/index.ts @@ -43,6 +43,17 @@ export type { ContextFile, } from "./context-bom.js"; export { buildContextBom, mergeSpans } from "./context-bom.js"; +export { + canonicalDecisionSet, + type DecisionDiff, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, + type RangeTuple, + type Selection, +} from "./decision-set.js"; export type { DepRow, DepsOpts } from "./deps.js"; export { buildDeps } from "./deps.js"; export type { FileTreeNode, FileTreeOpts } from "./file-tree.js"; From a736784f9b6c2f3ca88aff0201b8b1a1f4673eb4 Mon Sep 17 00:00:00 2001 From: Laith Al-Saadoon Date: Tue, 30 Jun 2026 12:55:44 +0000 Subject: [PATCH 3/3] test(cli): fix Windows path-resolution flake in seamed replay test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `runReplayCompare` calls `resolve(dir)` before the injected `_loadPack`, so the resolved path is platform-dependent — on Windows the POSIX `/fake/hashA` fixture key became `C:\fake\hashA` and the map lookup missed, throwing in all five seamed comparator tests. The loads are sequential (A then B), so the fake now serves packs in call order instead of keying on the unstable resolved path. Real cross-platform bug in the test harness, not a flake. --- packages/cli/src/commands/replay.test.ts | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/packages/cli/src/commands/replay.test.ts b/packages/cli/src/commands/replay.test.ts index d373f8e..d33c6af 100644 --- a/packages/cli/src/commands/replay.test.ts +++ b/packages/cli/src/commands/replay.test.ts @@ -45,14 +45,15 @@ const chunk = (path: string, startByte: number, endByte: number) => ({ path, sta describe("runReplayCompare (seamed)", () => { async function compare(a: LoadedPack, b: LoadedPack) { - const byDir = new Map([ - [a.dir, a], - [b.dir, b], - ]); + // `runReplayCompare` calls `resolve(dir)` before the loader, so the + // resolved path is platform-dependent (POSIX vs Windows). It always loads + // A then B sequentially, so the fake serves packs in call order rather than + // keying on the (unstable) resolved path. + const queue = [a, b]; return runReplayCompare(a.dir, b.dir, { - _loadPack: async (dir) => { - const p = byDir.get(dir); - if (p === undefined) throw new Error(`no fake pack at ${dir}`); + _loadPack: async () => { + const p = queue.shift(); + if (p === undefined) throw new Error("fake loader called more than twice"); return p; }, });