diff --git a/.erpaval/specs/011-replay-decision-equivalence/spec.md b/.erpaval/specs/011-replay-decision-equivalence/spec.md new file mode 100644 index 0000000..58abf41 --- /dev/null +++ b/.erpaval/specs/011-replay-decision-equivalence/spec.md @@ -0,0 +1,126 @@ +# Spec 011 — `codehub replay`: assert decision-equivalence structurally + +**Status:** Draft for review (NO code yet — review the contract pivot first). +**Author:** Bonk + Laith · **Date:** 2026-06-30 +**Branch:** `spec/011-replay-decision-equivalence` (off `main` @ `278702a`) +**Roadmap origin:** M-W-F run 2026-06-29, Move 6 ruling (decision-equivalence). This is the *structural* half; spec 010 / `pack --variance-probe` is the *behavioral* half (shipped, PR #269). +**Companion ADR:** 0020 — "decision-equivalence supersedes byte-identity as the pack contract" (drafted alongside this spec; 0020 is the next free ADR number, confirmed). + +--- + +## 0. The two halves of Move 6, and why this one is the keystone + +Laith ruled (2026-06-30): the pack contract pivots from **byte-identity** to **decision-equivalence** — "same inputs ⇒ provably the same *retrieval decision set* (same files + byte ranges selected under the same budget); byte-identity is one cheap witness, not the contract." + +- **Move 2 (`pack --variance-probe`, shipped)** measures the contract *behaviorally*: does an agent's answer wander less with the pack? A number, but an *empirical* one — it runs a stochastic agent and observes outcomes. +- **Move 6 structural half (`codehub replay`, this spec)** asserts the contract *structurally*: given the same inputs, did OCH select the **same decision set**? No agent, no stochasticity — a deterministic structural check. + +Laith's framing: this is "critical for data-backed results on how well OCH does." The variance probe says *the pack helps*; `replay` says *the pack is what we claim it is* — reproducible at the decision level, even when the bytes legitimately drift. Behavioral benefit (Move 2) on top of a structural guarantee (this) is the proof story. + +## 1. Diagnosis — why byte-identity is the wrong *contract* (it's the right *witness*) + +The standing invariant is **ROADMAP constraint U1: "graphHash byte-identity per commit"** (`.erpaval/ROADMAP.md:219` names it the one breaking-change budget OCH must preserve; `:201-202` lists U1/U2 as CI gates). It is asserted across six ADRs — the canonical statement is **ADR 0011** (`graphHash` = SHA-256 of the canonical-JSON `{edges, nodes}` projection, store-agnostic), and **ADR 0019** titles a section "graphHash byte-identity (the go/no-go)". The pack inherits it: `packHash` (`manifest.ts:52`) is `sha256(canonicalJson(manifest))`, and `pack-determinism.test.ts` asserts two runs produce byte-identical BOM files. The user-facing promise is `readme.ts:73`: *"same `(commit, tokenizer_id, budget_tokens, chonkie_version, grammar_commits)` produces a byte-identical pack and the same `pack_hash`."* + +Byte-identity is a fine *witness* but a brittle *contract*, because bytes are a poor proxy for the decision the auditor actually cares about: + +- **The `packHash` preimage binds incidental fields.** It includes `pins.chonkieVersion`, `pins.grammarCommits`, and each BOM file's `fileHash` (`manifest.ts:82-101`). A chonkie bump, a grammar-pin refresh, or a `tokenCount` recompute flips `packHash` — even when *the exact same byte ranges of the exact same files were selected under the same budget*. The promise in `readme.ts:73` literally lists `chonkie_version` and `grammar_commits` as inputs, conceding that a toolchain bump is a "different" pack. Under decision-equivalence it is the *same* pack. +- **The embedder-swap precedent — stated precisely.** The #252 swap (gte-modernbert → F2LLM-v2-80M, 320-dim) is the canonical "decision-irrelevant change." Precision matters here, because spec 010 §0 over-stated it: embeddings are **not** in the pack (the Parquet sidecar was dropped in ADR 0019; the BOM is **8 items**), and `graphHash` is embedder-neutral by construction (ADR 0014: it hashes only `{nodes, edges}`, never `store_meta`). So the swap breaks **neither** `packHash` **nor** `graphHash` today — it invalidates the `embeddings` table and the `store_meta` embedder fingerprint, forcing a re-index. The lesson is the general one: a legitimate change to *how* OCH builds the index (a better embedder, a newer grammar, a re-tokenizer) is exactly the kind of change a naive "did the bytes change?" auditor misreads as "the pack changed," when the retrieval decision — which files/ranges the agent saw — is identical. +- **An auditor doesn't care about bytes.** They care whether the agent's context came from the right places. Byte-identity over-promises (claims more than the contract needs) and under-delivers (breaks on changes the contract should tolerate). + +The fix: make the **decision set** the contract of record, and keep byte-identity as a *sufficient-but-not-necessary witness* of it. + +## 1.5. There is already a byte-identity `replay` — this supersedes its comparator + +A `codehub replay ` + `pack --prove` implementation already exists on the unmerged branch `feat/v1-distribution-breadth` (`e6a81c2`, not an ancestor of `main`). It is the **byte-identity predecessor** this spec supersedes. Its design (worth reusing): + +- `runReplay(hash)` reads `/.codehub/packs//manifest.json`, parses snake_case→camelCase. +- **Integrity tier** (always, offline): re-hash every BOM body on disk vs its attested `fileHash`; mismatch → hard drift. +- **Recompute tier:** re-derive `packHash` via `buildManifest`, assert equality. +- **Optional re-pack tier:** an injected `RepackDriver` checks out the commit, re-runs the packer, **byte-compares** `packHash`. `best_effort` (Claude tokenizer) tolerates re-pack drift; `strict`/`degraded` hard-fail on any byte difference. +- Verdict via `replayVerdict(r) → { line, exitCode }`. + +**What spec 011 changes:** the re-pack tier's comparator flips from *byte-identity* to *decision-equivalence*. A re-pack that drifts in bytes but selects the same decision set → `EQUIVALENT` (today: a `strict`-class drift would hard-fail). A re-pack that changes the decision set → `DIVERGED` (fail). The integrity + recompute tiers stay as the cheap byte-witness fast path. + +**Reuse + cleanup:** lift `parseManifest`, the tiered verdict, the `RepackDriver` seam, and `recomputePackHash` from `e6a81c2`. Its `parseManifest` still reads a `duckdb_version` pin and `schemaVersion: 1` — both stale post-ADR-0019 (current schema is `2`, no duckdb pin); drop them on the rebase. (Also: the `code-pack` CLI description and the ROADMAP still say "9-item BOM" — stale since ADR 0019; clean up to "8-item" in passing.) + +## 2. What the decision set IS (grounded in the current pack) + +The pack already encodes the decision set — this spec *projects* existing artifacts, it invents no new shape: + +- **`ast-chunks.jsonl`** — each row is an `AstChunk` (`ast-chunker.ts:68`): `{ path, startByte, endByte, tokenCount, language? }`, sorted `(path ASC, startByte ASC, endByte ASC)`. The `(path, startByte, endByte)` triple is *literally* "which file, which byte range, was selected under budget." +- **`context-bom.json`** — each `file` component (`context-bom.ts`) carries path, content hash, and an optional `opencodehub:byteRanges` property: merged, sorted, non-overlapping `[start, end)` spans (`mergeSpans`, `context-bom.ts:170`) — "the union of bytes read from this file." This is already a deterministic, byte-range projection independent of chunk text. + +> **The decision set** of a pack is the set of `(path, mergedByteRanges)` selections, taken under a given `budgetTokens`. Two packs are **decision-equivalent** iff their decision sets are equal — same paths, same merged byte ranges per path, same budget — regardless of `tokenCount`, `pins`, chunk text bytes, or serialization. + +Note (`ast-chunker.ts:30`): `startByte`/`endByte` are currently UTF-16 code-unit offsets stored under byte names (coincide with UTF-8 for ASCII). The comparator treats them as opaque offsets — equivalence is well-defined as long as both packs use the same convention, which they do. See Q1 for the optional line-granularity mode. + +## 3. The `decisionHash` — a normalized projection + +`replay` introduces a **`decisionHash`**: a hash over a canonical, incidental-free projection of the decision set. + +``` +decisionSet(pack) = + { + budgetTokens, # the budget the selection was made under + selections: [ # sorted by path + { path, ranges: mergedByteRanges(path) } # ranges = sorted non-overlapping [start,end) + ] + } +decisionHash = sha256(canonicalJson(decisionSet)) # same RFC 8785 helper as packHash +``` + +Deliberately **excluded** (the whole point): `tokenCount` per chunk; `pins.chonkieVersion` / `pins.grammarCommits`; chunk *text bytes* and per-file `fileHash`; `commit` / `repoOriginUrl` (provenance — reported, not hashed). + +Deliberately **included**: `path` + merged byte ranges (the selection); `budgetTokens` (the constraint — different budgets are *expected* to differ; reported distinctly, not as a violation). + +**Source of ranges.** Prefer `ast-chunks.jsonl` `(startByte,endByte)` merged per path; fall back to `context-bom.json` `byteRanges` when ast-chunks is absent/degraded. They should agree; `replay` flags when they don't (a real internal-consistency bug signal). + +**Relationship to `packHash`:** `packHash` equality ⇒ `decisionHash` equality (cheap witness — matching bytes trivially match the decision). `decisionHash` equality does NOT require `packHash` equality (the contract tolerates incidental drift). Fast path: if `packHash` matches, PASS without computing the projection; else compute and compare `decisionHash`. + +## 4. The `codehub replay` command + +Two modes — extend the existing `replay ` self-check, add a two-pack compare: + +``` +codehub replay [--repo ] [--repack] [--json] [--budget-strict] # self-check (extends e6a81c2) +codehub replay --compare [--json] [--budget-strict] # two-pack compare (new) +``` + +- **Self-check `replay `** — reads `/.codehub/packs//`. Integrity + recompute tiers (byte witness) stay. With `--repack`, re-pack the recorded `commit` and assert **decision-equivalence** (not byte-identity) against the stored pack. This is the structural analog of `codehub status`'s staleness record: "is this pack still the decision OCH would make today?" +- **Two-pack `--compare A B`** — read two pack dirs, project each to its decision set, compare. The minimal unit that proves the projection; no store, no re-pack. +- **Verdict** — `EQUIVALENT` (decision sets match) · `DIVERGED` (selections differ) · `BUDGET_MISMATCH` (different `budgetTokens` — reported distinctly; a violation only under `--budget-strict`). +- **On `DIVERGED`** — structured diff: paths only in A, paths only in B, and per-path range deltas (ranges added/removed). This is the actionable output — *what the agent would have seen differently*. +- **Exit code** — 0 on `EQUIVALENT`; non-zero on `DIVERGED` (and on `BUDGET_MISMATCH` only under `--budget-strict`). Usable as an on-demand structural gate. +- **`--json`** — full record (verdict + both `decisionHash`es + `packHash`es + diff) on stdout; human summary on stderr (context-bom discipline). The record is a pure function of the inputs — no clock/run-id — so it serializes reproducibly. + +## 5. Where it lives + shape + +- **`@opencodehub/pack`** gains the projection: a pure `buildDecisionSet(astChunks, contextBom) → DecisionSet` + `decisionHash(DecisionSet) → string`, exported beside the existing builders. It belongs in `pack` because it reuses the same determinism machinery (`canonicalJson`, the BOM shapes) and `replay` is a *reader* of pack artifacts. +- **CLI** `codehub replay` in `packages/cli/src/commands/replay.ts` (rebased from `e6a81c2`, comparator swapped), registered in `index.ts` next to `code-pack` (commander pattern; lazy `await import`). +- **Determinism of the projection itself:** `decisionSet` is a pure function of the input artifacts (no clock, no env), serialized through the same RFC 8785 `canonicalJson`. Two `replay` runs over the same packs print byte-identical records. + +## 6. EARS requirements (draft — for review) + +- **R1** WHEN given two packs built from the same `(commit, budget, tokenizer)`, `replay` SHALL compute each pack's `decisionHash` (a hash over the normalized `(path, mergedByteRanges, budgetTokens)` projection) and report `EQUIVALENT` iff they match. +- **R2** The `decisionHash` projection SHALL exclude `tokenCount`, `pins` (chonkie version, grammar commits), chunk text bytes, and per-file `fileHash`, so a toolchain-version bump that does not change the selection set yields the same `decisionHash`. +- **R3** WHERE `packHash` of the two packs is equal, `replay` SHALL short-circuit to `EQUIVALENT` without recomputing the projection (byte-identity is a sufficient witness). +- **R4** WHEN the decision sets differ, `replay` SHALL emit a structured diff naming paths present in only one pack and, for shared paths, the byte-range deltas — and SHALL exit non-zero. +- **R5** WHEN the two packs were built under different `budgetTokens`, `replay` SHALL report `BUDGET_MISMATCH` distinctly from `DIVERGED`, exiting zero by default and non-zero only under `--budget-strict`. +- **R6** The emitted `--json` record SHALL be a pure function of the inputs (no wall-clock/run-id), so the record serialization is reproducible. +- **R7** `replay` SHALL derive ranges from `ast-chunks.jsonl` when present and fall back to `context-bom.json` `byteRanges` otherwise, and SHALL flag when the two disagree for the same pack. +- **R8** The integrity + recompute tiers inherited from the `e6a81c2` `replay` (re-hash BOM bodies vs attested `fileHash`; recompute `packHash`) SHALL remain as the cheap byte-witness fast path; only the re-pack-equivalence comparator changes from byte-identity to decision-equivalence. + +## 7. Open questions for Laith (review before I build) + +1. **Byte ranges vs. line ranges.** ast-chunks records `(startByte, endByte)` as UTF-16 code-unit offsets today (`ast-chunker.ts:30`; coincide with UTF-8 for ASCII). Byte ranges are the precise contract; should I also offer a `--coarse` mode projecting to `(startLine, endLine)` for an encoding-robust, human-diffable view? I lean: byte ranges as the contract, line ranges as a reporting aid. +2. **On-demand vs. CI gate.** `replay` is deterministic and cheap (pure read + hash), unlike the variance probe. I lean: ship on-demand in v1; later add an opt-in `analyze`-time "this commit's pack is decision-equivalent to the last" assertion — but only after we've seen real diffs in practice. Don't gate CI on it until the projection is trusted. +3. **Does ADR 0020 retire the byte-identity gates, or layer over them?** I lean **layer, don't retire**: keep the `graphHash`/`packHash` byte-identity tests as the strict-witness fast path (cheap, valuable), and make decision-equivalence the *contract of record* that byte-identity is one way to satisfy. ADR 0020 reframes byte-identity from "the contract" (ROADMAP U1) to "a sufficient witness." Agree — or do you want the byte gates actually *relaxed* (e.g. let a pins-only delta pass the determinism gate)? +4. **v1 scope.** Two-pack `--compare A B` is the minimum that proves the projection and reuses no store/analyze. The `replay --repack` self-check needs a `RepackDriver` (checkout + re-pack). Ship two-pack compare + the inherited integrity/recompute tiers in v1, and `--repack` decision-equivalence in v2? (I lean yes.) +5. **Supersede or extend `e6a81c2`?** That branch's byte-identity `replay` is unmerged and 32 behind `main`. I lean: cherry-pick its scaffolding onto a fresh branch, drop the stale `duckdb_version`/`schemaVersion:1`, and land the decision-equivalence comparator as the same PR — so we don't carry two `replay`s. Agree? + +## 8. What this is NOT (scope guard) + +- Not the variance probe (spec 010 / Move 2 — behavioral half, shipped). +- Not a re-implementation of packing — `replay` is a pure *reader* of pack directories (plus an optional re-pack tier in v2). +- Not a CI gate in v1 — on-demand structural check (Q2). +- Not a graph-diff tool — it compares *pack decision sets*, not raw graphs (`detect_changes` already maps diffs to symbols — a different question). diff --git a/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md b/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md new file mode 100644 index 0000000..5d557be --- /dev/null +++ b/docs/adr/0020-decision-equivalence-supersedes-byte-identity.md @@ -0,0 +1,131 @@ +# ADR 0020 — Decision-equivalence is the pack contract; byte-identity is a witness, not the contract + +- Status: **Proposed** — 2026-06-30 (awaiting Laith's review; pairs with spec 011). +- Authors: Laith Al-Saadoon + Bonk. +- Branch: `spec/011-replay-decision-equivalence`. +- Amends (does not supersede): the byte-identity invariant asserted in + [ADR 0011 — Graph database backend](./0011-graph-db-backend.md) (the `graphHash` + invariant) and [ADR 0019 — Single-file SQLite storage](./0019-single-file-sqlite-storage.md) + (the "graphHash byte-identity (the go/no-go)" gate), and the ROADMAP U1/U2 + determinism constraints. Those gates **stay** — this ADR reframes what they + are *for*. It also supersedes the byte-identity comparator in the unmerged + `codehub replay` (`feat/v1-distribution-breadth`, `e6a81c2`). + +## Context + +The pack's reproducibility promise has been **byte-identity**: same inputs ⇒ +byte-identical artifact, witnessed by a hash. The chain: + +- **ROADMAP U1/U2** name "graphHash byte-identity per commit" and "deterministic + code-pack (same commit + tokenizer + budget → same bytes)" as the one + breaking-change budget OCH must preserve (`.erpaval/ROADMAP.md:201-202,219`). +- **ADR 0011** defines `graphHash` as the SHA-256 of the canonical-JSON + `{edges, nodes}` projection and gates it in CI; **ADR 0019** makes + byte-identical rebuild the migration go/no-go. +- The pack inherits it: `packHash = sha256(canonicalJson(manifest))` + (`packages/pack/src/manifest.ts:52`), and `pack-determinism.test.ts` asserts + two runs produce byte-identical BOM files. +- The user-facing promise (`packages/pack/src/readme.ts:73`): *"same + `(commit, tokenizer_id, budget_tokens, chonkie_version, grammar_commits)` + produces a byte-identical pack and the same `pack_hash`."* + +Byte-identity is a good *witness* but the wrong *contract*, because the bytes +bind things the auditor does not care about: + +1. **The `packHash` preimage includes incidental fields.** `pins.chonkieVersion`, + `pins.grammarCommits`, and every BOM file's `fileHash` are in the hash + (`manifest.ts:82-101`). A chonkie bump, a grammar-pin refresh, or a + `tokenCount` recompute flips `packHash` — even when the same byte ranges of + the same files were selected under the same budget. `readme.ts:73` literally + lists `chonkie_version` and `grammar_commits` as pack inputs, conceding that + a toolchain bump yields a "different" pack. The retrieval decision was + identical. + +2. **The embedder-swap precedent, stated precisely.** The #252 embedder swap + (gte-modernbert → F2LLM-v2-80M, 320-dim) is the canonical decision-irrelevant + change. Precision matters because the motivating prose (spec 010 §0) + over-stated the mechanism: embeddings are **not** in the pack — the Parquet + sidecar was dropped in ADR 0019, the BOM is **8 items**, and `graphHash` is + embedder-neutral by construction (ADR 0014: it hashes only `{nodes, edges}`, + never `store_meta`). So the swap breaks **neither** `packHash` **nor** + `graphHash` today; it invalidates the `embeddings` table and the `store_meta` + embedder fingerprint, forcing a re-index. The general lesson holds regardless: + a legitimate change to *how* OCH builds the index — a better embedder, a newer + grammar, a re-tokenizer — is exactly what a naive "did the bytes change?" + check misreads as "the pack changed," when which files/ranges the agent saw is + identical. + +3. **An auditor cares about the decision, not the bytes.** They want: did the + agent's context come from the right places? Byte-identity over-promises + (asserts more than the contract needs) and under-delivers (breaks on changes + the contract should tolerate). + +## Decision + +**The pack contract is decision-equivalence. Byte-identity is one sufficient +witness of it, not the contract itself.** + +- **Contract of record (decision-equivalence):** two packs built from the same + inputs are equivalent iff they have the same **decision set** — the same + `(path, mergedByteRanges)` selections under the same `budgetTokens` — + regardless of `tokenCount`, `pins`, chunk text bytes, or serialization. +- **Witness (byte-identity):** `packHash` equality ⇒ decision-equivalence + (matching bytes trivially match the decision). The existing `graphHash` / + `packHash` byte-identity gates **stay** as the cheap fast-path witness — they + are valuable and almost-free. They are reframed from "the contract" to "a + sufficient condition for satisfying the contract." +- **The decision set is a projection of existing artifacts**, not a new shape. + It is computed from `ast-chunks.jsonl` (`{path, startByte, endByte}` per chunk, + `ast-chunker.ts:68`) with `context-bom.json`'s merged `byteRanges` + (`context-bom.ts:170`) as the fallback/cross-check. +- **`decisionHash`** is `sha256(canonicalJson(decisionSet))`, using the same + RFC 8785 `canonicalJson` helper as `packHash`. It deliberately **excludes** + `tokenCount`, `pins`, chunk text bytes, and per-file `fileHash`; it + **includes** `path`, merged byte ranges, and `budgetTokens`. +- **`codehub replay`** is the structural assertion tool (spec 011): it compares + two packs' decision sets (or re-packs and compares against a stored pack), + reporting `EQUIVALENT` / `DIVERGED` / `BUDGET_MISMATCH` with a structured diff. + It supersedes the byte-identity comparator in the unmerged `e6a81c2` `replay`, + reusing that branch's integrity + recompute tiers as the byte-witness fast + path and swapping only the re-pack comparator. +- **No gate is relaxed in this ADR.** The byte-identity CI gates continue to run + unchanged. Decision-equivalence is *added* as the contract they serve. Whether + to later let a pins-only delta pass the determinism gate (treating it as + decision-equivalent) is an explicit follow-up, not decided here (spec 011 Q3). + +## Consequences + +**Positive.** + +- The reproducibility claim becomes one OCH can defend against legitimate + toolchain evolution: "upgrade the chunker, swap the embedder, bump a grammar — + the pack's *decision* is provably unchanged," with `codehub replay` as the + receipt. This is the data-backed "how well does OCH do" story paired with the + Move 2 variance probe. +- The contract stops over-promising. A grammar-pin bump no longer counts as "the + pack changed" to an auditor reading a hash. +- `replay`'s diff output is actionable in a way a hash inequality never was: it + names *which files/ranges the agent would have seen differently*. + +**Negative / costs.** + +- A second hash (`decisionHash`) and a projection to maintain alongside + `packHash`. Mitigated: the projection is pure and small, lives in + `@opencodehub/pack` beside the builders, and reuses `canonicalJson`. +- Two notions of "same pack" (byte-identical vs decision-equivalent) is a concept + an operator must learn. Mitigated: `packHash` stays the default identity in + paths/UX; `decisionHash` surfaces only through `replay`. +- The `ast-chunks` offsets are UTF-16 code-unit indices today + (`ast-chunker.ts:30`), not true UTF-8 byte offsets (coincide for ASCII). + Decision-equivalence is well-defined as long as both packs use the same + convention (they do); a future promotion to true byte offsets is a + cross-cutting change tracked separately. + +**Follow-ups (not decided here).** + +- Whether to relax the byte-identity CI gates to accept decision-equivalent + packs (e.g. a pins-only delta) — spec 011 Q3. +- Whether `replay` becomes an `analyze`-time or CI assertion vs. staying + on-demand — spec 011 Q2. +- Doc-drift cleanup: the ROADMAP and `code-pack` CLI description still say + "9-item BOM"; it has been 8 since ADR 0019. diff --git a/packages/cli/src/commands/replay.test.ts b/packages/cli/src/commands/replay.test.ts new file mode 100644 index 0000000..d33c6af --- /dev/null +++ b/packages/cli/src/commands/replay.test.ts @@ -0,0 +1,255 @@ +/** + * Tests for `codehub replay --compare` (decision-equivalence). + * + * Strategy: the comparator (`runReplayCompare`) is exercised via the + * `_loadPack` seam with hand-built `LoadedPack`s — no filesystem. `loadPack` + * itself is tested against a real on-disk pack directory (manifest + + * ast-chunks + context-bom) so the snake_case parsing + integrity tier + the + * JSONL/CycloneDX parsers are covered end-to-end. + */ + +import { strict as assert } from "node:assert"; +import { createHash } from "node:crypto"; +import { mkdtemp, rm, writeFile } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { after, before, describe, it } from "node:test"; +import { + type LoadedPack, + loadPack, + packDecisionSet, + replayVerdictLine, + runReplayCompare, + serializeReplayRecord, +} from "./replay.js"; + +const sha = (s: string) => createHash("sha256").update(s).digest("hex"); + +/** Build a LoadedPack with given chunks; manifest packHash defaults distinct. */ +function pack(over: Partial & { packHash: string; budget: number }): LoadedPack { + return { + dir: `/fake/${over.packHash}`, + manifest: { + packHash: over.packHash, + budgetTokens: over.budget, + commit: "c0ffee", + files: [], + }, + chunks: over.chunks ?? [], + byteRangesByPath: over.byteRangesByPath ?? new Map(), + integrityDrift: over.integrityDrift ?? [], + }; +} + +const chunk = (path: string, startByte: number, endByte: number) => ({ path, startByte, endByte }); + +describe("runReplayCompare (seamed)", () => { + async function compare(a: LoadedPack, b: LoadedPack) { + // `runReplayCompare` calls `resolve(dir)` before the loader, so the + // resolved path is platform-dependent (POSIX vs Windows). It always loads + // A then B sequentially, so the fake serves packs in call order rather than + // keying on the (unstable) resolved path. + const queue = [a, b]; + return runReplayCompare(a.dir, b.dir, { + _loadPack: async () => { + const p = queue.shift(); + if (p === undefined) throw new Error("fake loader called more than twice"); + return p; + }, + }); + } + + it("EQUIVALENT via packHash fast path when hashes match (no projection needed)", async () => { + const a = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 99)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT"); + assert.equal(r.decisionHashA, undefined, "fast path skips the projection"); + }); + + it("EQUIVALENT when packHashes differ but the decision set matches (the contract)", async () => { + // Same selection, different incidental bytes → different packHash, same decision. + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT"); + assert.equal(r.decisionHashA, r.decisionHashB, "decision hashes match"); + }); + + it("DIVERGED with a structured diff when selections differ", async () => { + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 20)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "DIVERGED"); + assert.ok(r.diff !== undefined); + assert.equal(r.diff?.rangeDeltas[0]?.path, "a.ts"); + assert.notEqual(r.decisionHashA, r.decisionHashB); + }); + + it("BUDGET_MISMATCH when budgets differ (reported distinctly, not DIVERGED)", async () => { + const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const b = pack({ packHash: "hashB", budget: 200, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "BUDGET_MISMATCH"); + assert.equal(r.budgetA, 100); + assert.equal(r.budgetB, 200); + }); + + it("CORRUPT when either pack has integrity drift (refuses to compare)", async () => { + const a = pack({ packHash: "hashA", budget: 100, integrityDrift: ["ast-chunks.jsonl"] }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "CORRUPT"); + assert.deepEqual(r.corruptItems, ["ast-chunks.jsonl"]); + }); + + it("falls back to context-bom byteRanges when ast-chunks is empty (R7)", async () => { + const a = pack({ + packHash: "hashA", + budget: 100, + byteRangesByPath: new Map([["a.ts", [{ start: 0, end: 10 }]]]), + }); + const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] }); + const r = await compare(a, b); + assert.equal(r.verdict, "EQUIVALENT", "byteRanges fallback == equivalent chunks"); + }); +}); + +describe("replayVerdictLine exit codes", () => { + const base = { packHashA: "a", packHashB: "b", budgetA: 100, budgetB: 100 } as const; + + it("EQUIVALENT → exit 0", () => { + assert.equal(replayVerdictLine({ verdict: "EQUIVALENT", ...base }, false).exitCode, 0); + }); + it("DIVERGED → exit 1", () => { + assert.equal(replayVerdictLine({ verdict: "DIVERGED", ...base }, false).exitCode, 1); + }); + it("CORRUPT → exit 1", () => { + assert.equal( + replayVerdictLine({ verdict: "CORRUPT", ...base, corruptItems: ["x"] }, false).exitCode, + 1, + ); + }); + it("BUDGET_MISMATCH → exit 0 by default, 1 under --budget-strict", () => { + const r = { verdict: "BUDGET_MISMATCH", ...base, budgetB: 200 } as const; + assert.equal(replayVerdictLine(r, false).exitCode, 0); + assert.equal(replayVerdictLine(r, true).exitCode, 1); + }); +}); + +describe("serializeReplayRecord (R6 determinism)", () => { + it("is byte-identical across calls and carries no clock/run-id", () => { + const r = { + verdict: "DIVERGED" as const, + packHashA: "a", + packHashB: "b", + decisionHashA: "da", + decisionHashB: "db", + budgetA: 100, + budgetB: 100, + diff: { equivalent: false, onlyInA: ["x.ts"], onlyInB: [], rangeDeltas: [] }, + }; + const j1 = serializeReplayRecord(r); + const j2 = serializeReplayRecord(r); + assert.equal(j1, j2); + assert.ok(!j1.includes("timestamp") && !j1.includes("Date")); + }); +}); + +describe("packDecisionSet (projection precedence)", () => { + it("prefers ast-chunks over context-bom byteRanges", () => { + const p = pack({ + packHash: "h", + budget: 100, + chunks: [chunk("a.ts", 0, 10)], + byteRangesByPath: new Map([["zzz.ts", [{ start: 0, end: 999 }]]]), + }); + const set = packDecisionSet(p); + assert.deepEqual( + set.selections.map((s) => s.path), + ["a.ts"], + "ast-chunks wins; context-bom ignored when chunks present", + ); + }); +}); + +describe("loadPack (real on-disk)", () => { + let dir: string; + before(async () => { + dir = await mkdtemp(join(tmpdir(), "och-replay-pack-")); + // ast-chunks.jsonl — one canonical-JSON AstChunk per line. + const astChunks = [ + JSON.stringify({ path: "a.ts", startByte: 0, endByte: 10, tokenCount: 3 }), + JSON.stringify({ path: "a.ts", startByte: 10, endByte: 20, tokenCount: 2 }), + "", + ].join("\n"); + await writeFile(join(dir, "ast-chunks.jsonl"), astChunks, "utf8"); + // context-bom.json — CycloneDX with an opencodehub:byteRanges property. + const contextBom = JSON.stringify({ + bomFormat: "CycloneDX", + specVersion: "1.6", + components: [ + { + type: "file", + name: "a.ts", + properties: [{ name: "opencodehub:byteRanges", value: JSON.stringify([[0, 20]]) }], + }, + ], + }); + await writeFile(join(dir, "context-bom.json"), contextBom, "utf8"); + // manifest.json — snake_case wire form. fileHashes match the bodies above. + const manifest = JSON.stringify({ + budget_tokens: 100, + commit: "c0ffee", + determinism_class: "strict", + files: [ + { kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: sha(astChunks) }, + { kind: "context-bom", path: "context-bom.json", file_hash: sha(contextBom) }, + ], + pack_hash: "deadbeef", + schema_version: 2, + }); + await writeFile(join(dir, "manifest.json"), manifest, "utf8"); + }); + after(async () => { + await rm(dir, { recursive: true, force: true }); + }); + + it("parses manifest (schema 2, no duckdb pin), ast-chunks, and context-bom ranges", async () => { + const loaded = await loadPack(dir); + assert.equal(loaded.manifest.packHash, "deadbeef"); + assert.equal(loaded.manifest.budgetTokens, 100); + assert.equal(loaded.chunks.length, 2, "two ast-chunk rows parsed (blank line skipped)"); + assert.equal(loaded.byteRangesByPath.get("a.ts")?.[0]?.end, 20); + assert.equal(loaded.integrityDrift.length, 0, "on-disk bytes match attested fileHashes"); + }); + + it("flags integrity drift when a body's bytes don't match its attested hash", async () => { + // Rewrite the manifest with a wrong fileHash for ast-chunks. + const badManifest = JSON.stringify({ + budget_tokens: 100, + commit: "c0ffee", + determinism_class: "strict", + files: [{ kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: "0".repeat(64) }], + pack_hash: "deadbeef", + schema_version: 2, + }); + const badDir = await mkdtemp(join(tmpdir(), "och-replay-bad-")); + try { + await writeFile( + join(badDir, "ast-chunks.jsonl"), + '{"path":"a.ts","startByte":0,"endByte":1}', + "utf8", + ); + await writeFile(join(badDir, "manifest.json"), badManifest, "utf8"); + const loaded = await loadPack(badDir); + assert.deepEqual(loaded.integrityDrift, ["ast-chunks.jsonl"]); + } finally { + await rm(badDir, { recursive: true, force: true }); + } + }); + + it("throws a clear error when the pack dir has no manifest", async () => { + await assert.rejects(() => loadPack(join(tmpdir(), "no-such-pack-dir")), /no pack at/); + }); +}); diff --git a/packages/cli/src/commands/replay.ts b/packages/cli/src/commands/replay.ts new file mode 100644 index 0000000..634d8a7 --- /dev/null +++ b/packages/cli/src/commands/replay.ts @@ -0,0 +1,361 @@ +/** + * `codehub replay --compare ` — assert two packs are + * decision-equivalent (spec 011 / ADR 0020). + * + * Decision-equivalence (the contract of record): two packs built from the same + * inputs are equivalent iff they select the **same decision set** — the same + * files + byte ranges, under the same budget — regardless of `tokenCount`, + * `pins`, chunk text, or serialization. Byte-identity (`packHash`) stays the + * cheap *sufficient witness*: if the two `packHash`es match, the decision + * trivially matches and we short-circuit (R3). + * + * Tiers (R8 — the cheap byte-witness layers from the prior byte-identity + * `replay` are kept, only the equivalence comparator changed to decision-set): + * 1. **Integrity** (always, offline): re-hash every BOM body on disk vs its + * attested `fileHash` in `manifest.json`. A drifted/corrupt pack is + * reported before any comparison — you can't compare a tampered pack. + * 2. **packHash fast path:** equal `packHash` ⇒ `EQUIVALENT` immediately. + * 3. **decision-equivalence:** project each pack to its decision set + * (ast-chunks preferred, context-bom `byteRanges` fallback — R7) and + * compare. Different `budgetTokens` ⇒ `BUDGET_MISMATCH` (R5). + * + * `console.log` to stdout is sanctioned in command modules (biome override); + * the JSON record goes to stdout, the human summary to stderr (the context-bom + * discipline). + */ + +import { createHash } from "node:crypto"; +import { existsSync } from "node:fs"; +import { readFile } from "node:fs/promises"; +import { join, resolve } from "node:path"; +import { canonicalJson } from "@opencodehub/core-types"; +import { + type DecisionDiff, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, +} from "@opencodehub/pack"; + +/** Minimal manifest fields `replay` reads (corrected for schema 2 — ADR 0019). */ +interface ReplayManifest { + readonly packHash: string; + readonly budgetTokens: number; + readonly commit: string; + readonly files: ReadonlyArray<{ + readonly kind: string; + readonly path: string; + readonly fileHash: string; + }>; +} + +/** A chunk row read from `ast-chunks.jsonl`. */ +interface AstChunkRow { + readonly path: string; + readonly startByte: number; + readonly endByte: number; +} + +/** Everything `replay` reads from one pack directory. */ +export interface LoadedPack { + readonly dir: string; + readonly manifest: ReplayManifest; + /** ast-chunks rows (empty when the file is absent/empty — production default). */ + readonly chunks: readonly AstChunkRow[]; + /** Per-path merged byte ranges parsed from context-bom.json. */ + readonly byteRangesByPath: ReadonlyMap>; + /** Integrity-tier drift: BOM bodies whose on-disk bytes ≠ attested fileHash. */ + readonly integrityDrift: readonly string[]; +} + +export type ReplayVerdict = "EQUIVALENT" | "DIVERGED" | "BUDGET_MISMATCH" | "CORRUPT"; + +export interface ReplayResult { + readonly verdict: ReplayVerdict; + readonly packHashA: string; + readonly packHashB: string; + /** Decision hashes — undefined when the packHash fast path settled it. */ + readonly decisionHashA?: string; + readonly decisionHashB?: string; + readonly budgetA: number; + readonly budgetB: number; + /** The structured diff, present on DIVERGED. */ + readonly diff?: DecisionDiff; + /** Integrity drift surfaced from either pack (present on CORRUPT). */ + readonly corruptItems?: readonly string[]; +} + +export interface ReplayCompareArgs { + /** Test seam — inject a pack loader so tests skip the filesystem. */ + readonly _loadPack?: (dir: string) => Promise; +} + +/** + * Compare two pack directories for decision-equivalence. Pure given the loaded + * packs; the loader (default {@link loadPack}) is the only I/O. + */ +export async function runReplayCompare( + packDirA: string, + packDirB: string, + args: ReplayCompareArgs = {}, +): Promise { + const load = args._loadPack ?? loadPack; + const a = await load(resolve(packDirA)); + const b = await load(resolve(packDirB)); + + // Tier 1: integrity. A pack whose bytes disagree with its own manifest is + // corrupt — refuse to compare it (the comparison would be meaningless). + const corrupt = [...a.integrityDrift, ...b.integrityDrift]; + if (corrupt.length > 0) { + return { + verdict: "CORRUPT", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + corruptItems: corrupt, + }; + } + + // Tier 2: packHash fast path (R3) — byte-identity is a sufficient witness. + if (a.manifest.packHash === b.manifest.packHash) { + return { + verdict: "EQUIVALENT", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + }; + } + + // Different budgets are expected to differ — report distinctly (R5), before + // the decision diff (a different budget is not a contract violation). + if (a.manifest.budgetTokens !== b.manifest.budgetTokens) { + return { + verdict: "BUDGET_MISMATCH", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + }; + } + + // Tier 3: decision-equivalence. + const setA = packDecisionSet(a); + const setB = packDecisionSet(b); + const diff = diffDecisionSets(setA, setB); + return { + verdict: diff.equivalent ? "EQUIVALENT" : "DIVERGED", + packHashA: a.manifest.packHash, + packHashB: b.manifest.packHash, + decisionHashA: decisionHash(setA), + decisionHashB: decisionHash(setB), + budgetA: a.manifest.budgetTokens, + budgetB: b.manifest.budgetTokens, + ...(diff.equivalent ? {} : { diff }), + }; +} + +/** + * Project a loaded pack to its decision set: ast-chunks preferred, context-bom + * `byteRanges` fallback (R7). Exported for tests. + */ +export function packDecisionSet(pack: LoadedPack): DecisionSet { + if (pack.chunks.length > 0) { + return decisionSetFromChunks(pack.chunks, pack.manifest.budgetTokens); + } + return decisionSetFromByteRanges(pack.byteRangesByPath, pack.manifest.budgetTokens); +} + +/** + * Load + parse a pack directory: manifest.json (snake_case → camelCase), + * ast-chunks.jsonl (JSONL), context-bom.json (CycloneDX byteRanges), and run + * the integrity tier (re-hash bodies vs attested fileHash). + */ +export async function loadPack(dir: string): Promise { + const manifestPath = join(dir, "manifest.json"); + if (!existsSync(manifestPath)) { + throw new Error( + `codehub replay: no pack at ${dir} (missing manifest.json). ` + + "Pass a .codehub/packs// directory produced by `codehub code-pack`.", + ); + } + const manifest = parseManifest(await readFile(manifestPath, "utf8")); + + // Integrity tier: re-hash each BOM body on disk vs its attested digest. + const integrityDrift: string[] = []; + for (const f of manifest.files) { + const bodyPath = join(dir, f.path); + if (!existsSync(bodyPath)) { + integrityDrift.push(f.path); + continue; + } + const recomputed = sha256HexBytes(await readFile(bodyPath)); + if (recomputed !== f.fileHash) integrityDrift.push(f.path); + } + + const chunks = await loadAstChunks(dir); + const byteRangesByPath = await loadContextBomRanges(dir); + return { dir, manifest, chunks, byteRangesByPath, integrityDrift }; +} + +/** + * Parse the on-disk snake_case manifest into the fields `replay` needs. + * Corrected for schema 2 (ADR 0019): no `duckdb_version` pin, `budget_tokens` + * is read for the decision set. + */ +function parseManifest(json: string): ReplayManifest { + const w = JSON.parse(json) as Record; + const files = (w["files"] ?? []) as Array>; + return { + packHash: String(w["pack_hash"] ?? ""), + budgetTokens: Number(w["budget_tokens"] ?? 0), + commit: String(w["commit"] ?? ""), + files: files.map((f) => ({ + kind: String(f["kind"] ?? ""), + path: String(f["path"] ?? ""), + fileHash: String(f["file_hash"] ?? ""), + })), + }; +} + +/** Read `ast-chunks.jsonl` (one canonical-JSON AstChunk per line). Absent → []. */ +async function loadAstChunks(dir: string): Promise { + const p = join(dir, "ast-chunks.jsonl"); + if (!existsSync(p)) return []; + const text = await readFile(p, "utf8"); + const rows: AstChunkRow[] = []; + for (const line of text.split("\n")) { + const trimmed = line.trim(); + if (trimmed.length === 0) continue; + const row = JSON.parse(trimmed) as Record; + rows.push({ + path: String(row["path"] ?? ""), + startByte: Number(row["startByte"] ?? 0), + endByte: Number(row["endByte"] ?? 0), + }); + } + return rows; +} + +/** + * Read `context-bom.json` and extract per-path byte ranges from the + * `opencodehub:byteRanges` property (a JSON-stringified `[[start,end],...]`). + * Absent → empty map. + */ +async function loadContextBomRanges( + dir: string, +): Promise>> { + const p = join(dir, "context-bom.json"); + const out = new Map>(); + if (!existsSync(p)) return out; + const doc = JSON.parse(await readFile(p, "utf8")) as { + components?: ReadonlyArray<{ + name?: unknown; + properties?: ReadonlyArray<{ name?: unknown; value?: unknown }>; + }>; + }; + for (const c of doc.components ?? []) { + const path = typeof c.name === "string" ? c.name : undefined; + if (path === undefined) continue; + const prop = (c.properties ?? []).find((x) => x.name === "opencodehub:byteRanges"); + if (prop === undefined || typeof prop.value !== "string") continue; + let pairs: unknown; + try { + pairs = JSON.parse(prop.value); + } catch { + continue; + } + if (!Array.isArray(pairs)) continue; + const ranges: { start: number; end: number }[] = []; + for (const pair of pairs) { + if (Array.isArray(pair) && pair.length === 2) { + ranges.push({ start: Number(pair[0]), end: Number(pair[1]) }); + } + } + if (ranges.length > 0) out.set(path, ranges); + } + return out; +} + +/** + * Render a {@link ReplayResult} to a one-line-plus-detail human summary and an + * exit code. Exported so the CLI action stays a thin shim and the mapping is + * unit-testable. + */ +export function replayVerdictLine( + r: ReplayResult, + budgetStrict: boolean, +): { line: string; exitCode: number } { + switch (r.verdict) { + case "EQUIVALENT": + return { line: "codehub replay: EQUIVALENT — same decision set", exitCode: 0 }; + case "BUDGET_MISMATCH": { + const line = `codehub replay: BUDGET_MISMATCH — A budget=${r.budgetA}, B budget=${r.budgetB} (decision sets not comparable under different budgets)`; + return { line, exitCode: budgetStrict ? 1 : 0 }; + } + case "CORRUPT": + return { + line: `codehub replay: CORRUPT — on-disk bytes drifted from the manifest for: ${(r.corruptItems ?? []).join(", ")}`, + exitCode: 1, + }; + case "DIVERGED": + return { line: formatDivergedSummary(r), exitCode: 1 }; + } +} + +/** Multi-line human summary of a DIVERGED verdict (the actionable diff). */ +function formatDivergedSummary(r: ReplayResult): string { + const lines: string[] = ["codehub replay: DIVERGED — the packs select different decision sets"]; + const diff = r.diff; + if (diff !== undefined) { + for (const p of diff.onlyInA) lines.push(` only in A: ${p}`); + for (const p of diff.onlyInB) lines.push(` only in B: ${p}`); + for (const d of diff.rangeDeltas) { + lines.push(` ranges differ: ${d.path} A=${fmtRanges(d.a)} B=${fmtRanges(d.b)}`); + } + } + return lines.join("\n"); +} + +function fmtRanges(ranges: ReadonlyArray): string { + return `[${ranges.map(([s, e]) => `${s}-${e}`).join(",")}]`; +} + +/** + * Print a {@link ReplayResult}. JSON → stdout (machine consumers / `--json`); + * the human summary → stderr so it never pollutes a piped stdout. + */ +export function printReplayResult(r: ReplayResult, asJson: boolean, budgetStrict: boolean): void { + const { line } = replayVerdictLine(r, budgetStrict); + if (asJson) { + console.log(serializeReplayRecord(r)); + } else { + console.warn(line); + } +} + +/** Canonical JSON of the replay record — pure function of the inputs (R6). */ +export function serializeReplayRecord(r: ReplayResult): string { + // Reuse the decision-set canonical serializer's discipline by hand-building a + // stable object; the record carries no clock/run-id, so it is reproducible. + const record: Record = { + verdict: r.verdict, + packHashA: r.packHashA, + packHashB: r.packHashB, + budgetA: r.budgetA, + budgetB: r.budgetB, + }; + if (r.decisionHashA !== undefined) record["decisionHashA"] = r.decisionHashA; + if (r.decisionHashB !== undefined) record["decisionHashB"] = r.decisionHashB; + if (r.diff !== undefined) record["diff"] = r.diff; + if (r.corruptItems !== undefined) record["corruptItems"] = r.corruptItems; + // The same RFC 8785 helper that backs packHash — sorts keys + normalizes + // numbers, so the record serializes byte-identically given the same inputs. + return canonicalJson(record); +} + +function sha256HexBytes(bytes: Uint8Array): string { + return createHash("sha256").update(bytes).digest("hex"); +} diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index 3b4d6f1..418b611 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -453,6 +453,42 @@ program } }); +program + .command("replay") + .description( + "Assert two code-packs are decision-equivalent (spec 011 / ADR 0020): same files + byte " + + "ranges selected under the same budget, regardless of incidental drift (tokenCount, pins, " + + "chunk text). packHash equality is the cheap witness; a decisionHash projection is the " + + "contract. Verdict: EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT. On-demand, never a CI gate.", + ) + .requiredOption( + "--compare ", + "Two pack directories (.codehub/packs//) to compare for decision-equivalence", + ) + .option( + "--json", + "Emit the full replay record (verdict + decisionHashes + diff) as JSON on stdout", + ) + .option( + "--budget-strict", + "Treat a BUDGET_MISMATCH (different --budget between the packs) as a failure exit", + ) + .action(async (opts: Record) => { + const mod = await import("./commands/replay.js"); + const packs = Array.isArray(opts["compare"]) ? (opts["compare"] as string[]) : []; + if (packs.length !== 2) { + throw new Error( + `codehub replay --compare expects exactly two pack directories, got ${packs.length}.`, + ); + } + const budgetStrict = opts["budgetStrict"] === true; + const [packA, packB] = packs as [string, string]; + const result = await mod.runReplayCompare(packA, packB); + mod.printReplayResult(result, opts["json"] === true, budgetStrict); + const { exitCode } = mod.replayVerdictLine(result, budgetStrict); + if (exitCode !== 0) process.exitCode = exitCode; + }); + program .command("query ") .description("Direct hybrid search against a repo's graph") diff --git a/packages/pack/src/decision-set.test.ts b/packages/pack/src/decision-set.test.ts new file mode 100644 index 0000000..47fbff1 --- /dev/null +++ b/packages/pack/src/decision-set.test.ts @@ -0,0 +1,128 @@ +import { strict as assert } from "node:assert"; +import { describe, it } from "node:test"; +import type { ByteSpan } from "./context-bom.js"; +import { + canonicalDecisionSet, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, +} from "./decision-set.js"; + +const chunk = (path: string, startByte: number, endByte: number, tokenCount = 1) => ({ + path, + startByte, + endByte, + tokenCount, +}); + +describe("decisionSetFromChunks", () => { + it("groups by path, merges adjacent/overlapping spans, sorts paths", () => { + const set = decisionSetFromChunks( + [ + chunk("b.ts", 10, 20), + chunk("a.ts", 0, 10), + chunk("a.ts", 10, 25), // adjacent to [0,10) → merges to [0,25) + chunk("b.ts", 0, 10), + ], + 100, + ); + assert.equal(set.budgetTokens, 100); + assert.deepEqual( + set.selections.map((s) => s.path), + ["a.ts", "b.ts"], + "paths sorted ASC", + ); + assert.deepEqual(set.selections[0]?.ranges, [[0, 25]], "a.ts spans merged"); + assert.deepEqual(set.selections[1]?.ranges, [[0, 20]], "b.ts spans merged"); + }); + + it("EXCLUDES tokenCount — a tokenCount-only drift is decision-equivalent", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10, 3)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10, 999)], 100); + assert.equal(decisionHash(a), decisionHash(b), "tokenCount not in the projection"); + }); + + it("drops a path whose spans are all zero-length / inverted", () => { + const set = decisionSetFromChunks([chunk("a.ts", 5, 5), chunk("a.ts", 9, 3)], 100); + assert.equal(set.selections.length, 0, "no real ranges → not a selection"); + }); +}); + +describe("decisionSetFromByteRanges (context-bom fallback)", () => { + it("produces the same decision set as the equivalent chunks", () => { + const fromChunks = decisionSetFromChunks([chunk("a.ts", 0, 10), chunk("a.ts", 10, 20)], 100); + const ranges = new Map([["a.ts", [{ start: 0, end: 20 }]]]); + const fromRanges = decisionSetFromByteRanges(ranges, 100); + assert.equal(decisionHash(fromChunks), decisionHash(fromRanges)); + }); +}); + +describe("decisionHash", () => { + it("is stable across two calls (pure)", () => { + const set = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + assert.equal(decisionHash(set), decisionHash(set)); + }); + + it("differs when the selected byte ranges differ", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 12)], 100); + assert.notEqual(decisionHash(a), decisionHash(b)); + }); + + it("differs when the budget differs (budget is part of the decision)", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10)], 200); + assert.notEqual(decisionHash(a), decisionHash(b)); + }); + + it("is independent of input chunk order (grouping is order-free)", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 5), chunk("b.ts", 0, 5)], 100); + const b = decisionSetFromChunks([chunk("b.ts", 0, 5), chunk("a.ts", 0, 5)], 100); + assert.equal(decisionHash(a), decisionHash(b)); + }); +}); + +describe("canonicalDecisionSet", () => { + it("serializes byte-identically for the same set", () => { + const set: DecisionSet = { + budgetTokens: 100, + selections: [{ path: "a.ts", ranges: [[0, 10]] }], + }; + assert.equal(canonicalDecisionSet(set), canonicalDecisionSet(set)); + }); +}); + +describe("diffDecisionSets", () => { + it("reports equivalent for identical sets", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, true); + assert.equal(diff.onlyInA.length, 0); + assert.equal(diff.onlyInB.length, 0); + assert.equal(diff.rangeDeltas.length, 0); + }); + + it("names paths present in only one set", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10), chunk("shared.ts", 0, 5)], 100); + const b = decisionSetFromChunks([chunk("b.ts", 0, 10), chunk("shared.ts", 0, 5)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, false); + assert.deepEqual(diff.onlyInA, ["a.ts"]); + assert.deepEqual(diff.onlyInB, ["b.ts"]); + assert.equal(diff.rangeDeltas.length, 0, "shared.ts ranges match"); + }); + + it("reports range deltas for a shared path whose ranges differ", () => { + const a = decisionSetFromChunks([chunk("a.ts", 0, 10)], 100); + const b = decisionSetFromChunks([chunk("a.ts", 0, 20)], 100); + const diff = diffDecisionSets(a, b); + assert.equal(diff.equivalent, false); + assert.equal(diff.rangeDeltas.length, 1); + assert.equal(diff.rangeDeltas[0]?.path, "a.ts"); + assert.deepEqual(diff.rangeDeltas[0]?.a, [[0, 10]]); + assert.deepEqual(diff.rangeDeltas[0]?.b, [[0, 20]]); + }); +}); diff --git a/packages/pack/src/decision-set.ts b/packages/pack/src/decision-set.ts new file mode 100644 index 0000000..ac30ee6 --- /dev/null +++ b/packages/pack/src/decision-set.ts @@ -0,0 +1,183 @@ +/** + * Decision set + `decisionHash` (spec 011 / ADR 0020). + * + * The pack's contract pivoted from byte-identity to **decision-equivalence** + * (ADR 0020): two packs built from the same inputs are equivalent iff they + * select the **same decision set** — the same files + byte ranges, under the + * same budget — regardless of `tokenCount`, `pins`, chunk text bytes, or + * serialization. Byte-identity (`packHash`) stays a cheap *sufficient witness*. + * + * This module computes the decision set as a normalized projection of the two + * pack artifacts that already encode "which file, which byte range, selected": + * - `ast-chunks.jsonl` — each row's `(path, startByte, endByte)` triple. + * - `context-bom.json` — each file component's merged `byteRanges`. + * ast-chunks is preferred; the context-bom is the fallback/cross-check. + * + * The projection deliberately EXCLUDES the incidental fields whose drift is + * decision-irrelevant: `tokenCount`, `pins` (chonkie version, grammar + * commits), chunk text, per-file `fileHash`, and provenance (`commit`). + * `decisionHash` is `sha256(canonicalJson(decisionSet))` — the same RFC 8785 + * machinery as `packHash`, so two `replay` runs over the same packs serialize + * identically. + */ + +import { canonicalJson, sha256Hex } from "@opencodehub/core-types"; +import { type ByteSpan, mergeSpans } from "./context-bom.js"; + +/** A `[start, end)` byte range, surfaced as a 2-tuple for compact hashing. */ +export type RangeTuple = readonly [start: number, end: number]; + +/** One file's selection: its path + the merged, sorted byte ranges chosen. */ +export interface Selection { + readonly path: string; + /** Sorted, non-overlapping `[start, end)` ranges (from {@link mergeSpans}). */ + readonly ranges: readonly RangeTuple[]; +} + +/** The normalized, incidental-free decision set of a pack. */ +export interface DecisionSet { + /** The budget the selection was made under — different budgets differ by design. */ + readonly budgetTokens: number; + /** Selections sorted by path ASC; each path's ranges sorted + merged. */ + readonly selections: readonly Selection[]; +} + +/** A chunk row as read from `ast-chunks.jsonl` (the {@link AstChunk} shape). */ +interface ChunkLike { + readonly path: string; + readonly startByte: number; + readonly endByte: number; +} + +/** + * Build the decision set from AST chunks. Groups chunks by path, merges each + * path's spans into sorted non-overlapping ranges, and sorts paths. Pure. + */ +export function decisionSetFromChunks( + chunks: readonly ChunkLike[], + budgetTokens: number, +): DecisionSet { + const byPath = new Map(); + for (const c of chunks) { + const spans = byPath.get(c.path); + const span: ByteSpan = { start: c.startByte, end: c.endByte }; + if (spans === undefined) byPath.set(c.path, [span]); + else spans.push(span); + } + return assembleDecisionSet(byPath, budgetTokens); +} + +/** + * Build the decision set from per-path byte spans (e.g. the context-bom's + * `byteRanges`). The fallback path when ast-chunks is absent. Pure. + */ +export function decisionSetFromByteRanges( + byteRangesByPath: ReadonlyMap, + budgetTokens: number, +): DecisionSet { + const byPath = new Map(); + for (const [path, spans] of byteRangesByPath) { + byPath.set(path, [...spans]); + } + return assembleDecisionSet(byPath, budgetTokens); +} + +/** Merge + sort the per-path spans into the canonical {@link DecisionSet}. */ +function assembleDecisionSet( + byPath: ReadonlyMap, + budgetTokens: number, +): DecisionSet { + const selections: Selection[] = []; + for (const [path, spans] of byPath) { + const merged = mergeSpans(spans); + if (merged.length === 0) continue; // a path with no real ranges is not a selection + selections.push({ + path, + ranges: merged.map((s) => [s.start, s.end] as const), + }); + } + selections.sort((a, b) => (a.path < b.path ? -1 : a.path > b.path ? 1 : 0)); + return { budgetTokens, selections }; +} + +/** + * The `decisionHash` — `sha256(canonicalJson(decisionSet))`. Same RFC 8785 + * helper as `packHash`, so it is byte-stable across processes given the same + * decision set. + */ +export function decisionHash(set: DecisionSet): string { + return sha256Hex(canonicalDecisionSet(set)); +} + +/** Canonical JSON of a decision set — exported so callers can hash/compare it. */ +export function canonicalDecisionSet(set: DecisionSet): string { + // The DecisionSet shape is already canonical (sorted selections, merged + // ranges); routing through canonicalJson sorts object keys + fixes number + // format so the bytes match packHash's discipline exactly. + return canonicalJson(set); +} + +/** The structured difference between two decision sets (the `DIVERGED` output). */ +export interface DecisionDiff { + /** True when the two sets select identically (same paths + ranges). */ + readonly equivalent: boolean; + /** Paths selected in A but not B. */ + readonly onlyInA: readonly string[]; + /** Paths selected in B but not A. */ + readonly onlyInB: readonly string[]; + /** Shared paths whose merged ranges differ, with both sides' ranges. */ + readonly rangeDeltas: readonly { + readonly path: string; + readonly a: readonly RangeTuple[]; + readonly b: readonly RangeTuple[]; + }[]; +} + +/** + * Diff two decision sets. Names paths present in only one set and, for shared + * paths, the range deltas. `equivalent` is true iff there are no path or range + * differences. Pure; the budget is compared by the caller (a budget mismatch + * is reported distinctly, not folded into this diff). + */ +export function diffDecisionSets(a: DecisionSet, b: DecisionSet): DecisionDiff { + const aByPath = new Map(a.selections.map((s) => [s.path, s.ranges])); + const bByPath = new Map(b.selections.map((s) => [s.path, s.ranges])); + + const onlyInA: string[] = []; + const onlyInB: string[] = []; + const rangeDeltas: { path: string; a: readonly RangeTuple[]; b: readonly RangeTuple[] }[] = []; + + for (const [path, aRanges] of aByPath) { + const bRanges = bByPath.get(path); + if (bRanges === undefined) { + onlyInA.push(path); + } else if (!rangesEqual(aRanges, bRanges)) { + rangeDeltas.push({ path, a: aRanges, b: bRanges }); + } + } + for (const path of bByPath.keys()) { + if (!aByPath.has(path)) onlyInB.push(path); + } + + onlyInA.sort(); + onlyInB.sort(); + rangeDeltas.sort((x, y) => (x.path < y.path ? -1 : x.path > y.path ? 1 : 0)); + + return { + equivalent: onlyInA.length === 0 && onlyInB.length === 0 && rangeDeltas.length === 0, + onlyInA, + onlyInB, + rangeDeltas, + }; +} + +function rangesEqual(a: readonly RangeTuple[], b: readonly RangeTuple[]): boolean { + if (a.length !== b.length) return false; + for (let i = 0; i < a.length; i += 1) { + const ra = a[i]; + const rb = b[i]; + if (ra === undefined || rb === undefined) return false; + if (ra[0] !== rb[0] || ra[1] !== rb[1]) return false; + } + return true; +} diff --git a/packages/pack/src/index.ts b/packages/pack/src/index.ts index c39f563..cee8c67 100644 --- a/packages/pack/src/index.ts +++ b/packages/pack/src/index.ts @@ -43,6 +43,17 @@ export type { ContextFile, } from "./context-bom.js"; export { buildContextBom, mergeSpans } from "./context-bom.js"; +export { + canonicalDecisionSet, + type DecisionDiff, + type DecisionSet, + decisionHash, + decisionSetFromByteRanges, + decisionSetFromChunks, + diffDecisionSets, + type RangeTuple, + type Selection, +} from "./decision-set.js"; export type { DepRow, DepsOpts } from "./deps.js"; export { buildDeps } from "./deps.js"; export type { FileTreeNode, FileTreeOpts } from "./file-tree.js";