Skip to content

feat: implement advisor plans 001-006 (security, deps, tests, refactors)#158

Merged
ThomasK33 merged 6 commits into
mainfrom
advisor/implement-all-plans
Jun 16, 2026
Merged

feat: implement advisor plans 001-006 (security, deps, tests, refactors)#158
ThomasK33 merged 6 commits into
mainfrom
advisor/implement-all-plans

Conversation

@ThomasK33

Copy link
Copy Markdown
Member

Summary

The six improvement plans produced by an earlier read-only audit have been implemented on this branch. Each plan was executed in an isolated git worktree, verified against its own gates, and reviewed against its scope and done-criteria before being integrated here. The plan documents are committed under plans/ and reproduced in collapsible sections below (and in a follow-up comment) so the exact steps can be reviewed.

What changed, by plan

  • 001 — Harden local-state permissions (security). The per-session RPC socket directory is now created 0o700, the socket file 0o600, and persisted state files (session manifests and the Home Registry) 0o600, so another local user can no longer connect to a session's control socket or read its state under a permissive umask. A Unix-guarded integration test asserts the modes and that the owning user is unaffected.
  • 002 — Dependency-audit gate (security / CI). An aube audit --audit-level high step is wired into the mise ci task and the linux-static CI job, so high/critical advisories now fail the build instead of going unnoticed. The seven advisories present at audit time (all in transitive build/tooling dependencies) were cleared via package.json overrides; aube audit now reports zero vulnerabilities.
  • 003 — Release contract (docs). The RELEASE.md support contract no longer claims a stale "0.2.x" release line; the framing was made version-agnostic so release automation will not re-stale it.
  • 004 — hostMain characterization tests (tests). The four pure decision helpers in hostMain.ts are now exported (with no logic change) and covered by unit tests, and an idle-timeout auto-exit integration test was added. Coverage of the per-session orchestration core went from a single assertion to thirteen cases plus the new integration test.
  • 005 — Shared replay iteration (refactor). The replay-event sequence/ordering logic that was duplicated across the two renderer backends was extracted into a single tested helper (iterateInRangeReplayEvents); both backends now delegate to it. The change is behavior-preserving.
  • 006 — Split the ghostty-web backend (refactor). The ~790-line embedded harness HTML and the harness-decoding layer were extracted into sibling modules, shrinking src/renderer/ghosttyWeb/backend.ts from 2814 to ~1620 lines. The move is byte-identical, and the externally-consumed decode helpers are re-exported so existing imports keep resolving.

Verification

The integrated branch was run through the full gate: formatting (tracked files), lint, typecheck, workflow-lint, the new aube audit gate (0 vulnerabilities), build, unit tests (1329 passing), e2e (33 passing), and the packaging smoke test — all pass. Integration tests are 187/188.

The single integration failure — screen-hash.test.ts asserting that structured and text snapshots agree on screenHash — is pre-existing and unrelated to this change. It reproduces on the base commit c11e2e2 with byte-identical hashes, and the renderer refactor (005/006) produces those same hashes, which confirms rendering behaviour was preserved. It looks like a latent divergence between how the structured and text snapshot hashes are derived and is worth tracking separately.

Environment note: the integration/e2e numbers above reflect a local macOS run; CI runs on Linux, where this test may behave differently.

Notes for review

  • The changes were generated from the committed plan documents and integrated on this branch via merge; that internal merge history will be squashed on merge.
  • Unrelated local skill-tooling changes were intentionally left out to keep this change focused on the six plans.
  • Due to GitHub's body-size limit, plans 001–004 are inlined below and plans 005–006 follow in a comment.

Implementation plans 001–004

001-harden-local-state-permissions.md

Plan 001: agent-tty restricts its local socket and state files to the owning user

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- src/host/hostMain.ts src/host/rpcServer.ts src/storage/manifests.ts
If any in-scope file changed since this plan was written, compare the
"Current state" excerpts against the live code before proceeding; on a
mismatch, treat it as a STOP condition.

Status

  • Priority: P1
  • Effort: S–M
  • Risk: LOW
  • Depends on: none
  • Category: security
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

agent-tty gives a caller full control of a real PTY: the RPC server accepts
type, paste, send-keys, run, resize, and signal — i.e. arbitrary
input into your shell. That server listens on a Unix domain socket at a
deterministic, world-traversable path under /tmp/agent-tty/, and the
socket and the session state files are created with no explicit permissions,
so they inherit the process umask. On a shared machine (multi-user dev box, a
shared CI runner) with a permissive umask, another local user can connect to the
socket and drive your session — effectively arbitrary command execution as you —
or read your session manifests and Home Registry. The default umask (022)
happens to block connecting (connect needs write on the socket file), but
relying on ambient umask for an authorization boundary is fragile. This plan
makes the boundary explicit: the per-Home socket directory becomes owner-only
(0o700), the socket file and persisted state files become owner-only
(0o600), regardless of umask.

Current state

Files involved:

  • src/host/hostMain.ts — per-session host entrypoint (runHost); creates the
    socket directory just before the RPC server listens.
  • src/host/rpcServer.tsRpcServer.listen() binds the Unix domain socket.
  • src/storage/manifests.tswriteTextFileAtomic, the single writer used for
    the session manifest and the Home Registry (homes.json).
  • src/storage/sessionPaths.ts — builds the socket path
    /tmp/agent-tty/<sha256(home)[:8]>/<sha256(sessionId)[:12]> (read-only here;
    do not change path construction — it is already traversal-guarded).

Socket directory creation — src/host/hostMain.ts around line 1077 (inside
runHost, mkdir is already imported from node:fs/promises on line 1):

await mkdir(dirname(sPath), { recursive: true });

sPath is the socket path (const sPath = socketPath(sessDir); earlier in
runHost, ~line 143). dirname(sPath) is the per-Home socket directory. The
mkdir passes no mode, so the directory inherits the umask.

Socket bind — src/host/rpcServer.ts:190-229 (server.listen sets no
permissions on the created socket file):

  public async listen(): Promise<void> {
    invariant(this.server === null, 'RPC server is already listening.');

    await this.removeStaleSocketIfNeeded();
    // ... length + existence invariants ...
    const server = net.createServer((socket) => {
      this.handleConnection(socket);
    });
    server.on('error', () => { /* ... */ });
    this.server = server;

    try {
      await new Promise<void>((resolve, reject) => {
        const onError = (error: Error): void => { reject(error); };
        server.once('error', onError);
        server.listen(this.socketPath, () => {
          server.off('error', onError);
          resolve();
        });
      });
    } catch (error) {
      this.server = null;
      throw error;
    }
  }

this.socketPath is a private field set in the constructor. net is imported
at the top of the file; node:fs/promises is not yet imported there.

State-file writer — src/storage/manifests.ts:100-120 (no mode on
writeFile, so manifests and homes.json inherit the umask, typically 0o644
= world-readable):

export async function writeTextFileAtomic(
  options: WriteTextFileAtomicOptions,
): Promise<void> {
  assertAbsoluteStoragePath(options.path, options.pathLabel);

  const outputDirectory = dirname(options.path);
  const temporaryPath = `${options.path}.tmp-${randomUUID()}`;

  try {
    await mkdir(outputDirectory, { recursive: true });
    await writeFile(temporaryPath, options.contents, 'utf8');
    await rename(temporaryPath, options.path);
  } catch (error) {
    await rm(temporaryPath, { force: true }).catch(() => undefined);
    throw makeCliError(ERROR_CODES.STORAGE_WRITE_ERROR, {
      message: options.writeErrorMessage,
      details: { path: options.path },
      cause: error,
    });
  }
}

mkdir, rename, rm, writeFile are imported from node:fs/promises on
line 2 of this file.

Conventions to follow

  • This is strict TypeScript, NodeNext ESM. Imports from TypeScript source use
    .js extensions. Prefer import type for type-only imports.
  • Use the existing invariant helper (src/util/assert.ts) for preconditions;
    match the surrounding small-helper, explicit-control-flow style.
  • chmod is preferred over a mkdir mode option because mkdir's mode
    is masked by the umask, but chmod is not
    chmod guarantees the final
    mode. Octal literals like 0o700 are the standard Node idiom.
  • 2-space indent, single quotes, trailing commas, semicolons (oxfmt enforces).

Design constraints (from CONTEXT.md / AGENTS.md — honor these)

  • The socket path is derived per Home then per Session; a single
    per-Home socket directory holds one socket file per Session. Locking the
    directory to 0o700 is per-Home and is correct for all sessions in that Home.
  • Storage writes must stay inside validated helpers; do not add ad-hoc
    fs permission logic in command code — change it in writeTextFileAtomic
    (the single manifest/registry writer) and in the host socket setup only.

Commands you will need

Purpose Command Expected on success
Install deps aube install exit 0
Typecheck npm run typecheck exit 0, no errors
Lint npm run lint exit 0
Format (fix) npm run format exit 0
Run one test npx vitest run test/integration/<file>.test.ts all pass
Integration set npm run test:integration all pass

Scope

In scope (the only files you should modify):

  • src/host/hostMain.ts — chmod the socket directory after creating it.
  • src/host/rpcServer.ts — chmod the socket file after listen() resolves.
  • src/storage/manifests.ts — write state files with mode: 0o600.
  • A new or existing test file under test/integration/ (see Test plan).

Out of scope (do NOT touch):

  • src/storage/sessionPaths.ts — path construction is already traversal-guarded
    with dirname(x) === root invariants. Do not change it.
  • CHANGELOG.md — automation-owned (Communique/release-please). Never edit it
    in a feature change; a manual edit conflicts with main and breaks CI.
  • The public CLI JSON envelopes / protocol schemas — this change is internal.
  • Windows-specific permission behavior — Windows is tier-2 and Unix mode bits
    do not apply; guard the new test to Unix only (see Test plan).

Git workflow

  • Branch: advisor/001-harden-local-state-permissions
  • Commit message style: Conventional Commits (repo enforces this on PR titles).
    Example from history: fix: drop the component suffix from the release branch name.
    Use e.g. fix: restrict agent-tty socket and state files to the owning user.
  • Do NOT push or open a PR unless the operator instructed it.

Steps

Step 1: Lock the per-Home socket directory to 0o700

In src/host/hostMain.ts:

  1. Add chmod to the existing node:fs/promises import on line 1
    (import { chmod, mkdir } from 'node:fs/promises';).
  2. At the socket-directory creation site (~line 1077), after the existing
    mkdir, add a chmod of that directory to 0o700:
const socketDirectory = dirname(sPath);
await mkdir(socketDirectory, { recursive: true });
await chmod(socketDirectory, 0o700);

(If sPath / dirname(sPath) is already bound to a local variable nearby,
reuse it instead of recomputing — keep one dirname(sPath) expression.)

Verify: npm run typecheck → exit 0, no errors.

Step 2: Lock the socket file to 0o600 after bind

In src/host/rpcServer.ts:

  1. Add an import: import { chmod } from 'node:fs/promises'; (place it with the
    other node: imports at the top).
  2. Inside listen(), immediately after the await new Promise<void>(...) that
    resolves when server.listen(...) succeeds (i.e. after the try/catch that
    binds the socket, before the method returns), chmod the socket file:
await chmod(this.socketPath, 0o600);

Place this after the bind succeeds (the socket file does not exist until
listen resolves). Do not place it inside the catch.

Verify: npm run typecheck → exit 0. Then npm run test:integration
→ all pass (existing RPC/lifecycle integration tests still connect, because the
owner retains read/write).

Step 3: Write persisted state files as 0o600

In src/storage/manifests.ts, change the writeFile call in
writeTextFileAtomic to set an explicit mode:

await writeFile(temporaryPath, options.contents, {
  encoding: 'utf8',
  mode: 0o600,
});

The mode survives the subsequent rename to the final path (rename preserves
the inode and its mode). No other change in this function.

Verify: npm run typecheck → exit 0.

Step 4: Format and full static check

Run npm run format then npm run lint → both exit 0.

Test plan

Add a focused integration test that creates a session and asserts the
permission bits. Model it on an existing integration test that already spins up
a session against an isolated AGENT_TTY_HOME — inspect test/integration/
for one that calls create then destroy (e.g. test/integration/gc.test.ts
or a lifecycle test) and copy its setup/teardown shape (isolated temp home,
absolute AGENT_TTY_HOME, never the real ~/.agent-tty).

New test file: test/integration/socket-permissions.test.ts (or add a case to
the closest existing lifecycle integration test if the maintainer prefers).
Cover:

  • Socket directory is 0o700: after create, locate the per-Home socket
    directory under /tmp/agent-tty/ for the test's Home and assert
    (statSync(dir).mode & 0o777) === 0o700.
  • Socket file is 0o600: assert the bound socket file's
    (mode & 0o777) === 0o600.
  • Manifest is 0o600: after create, assert the session manifest file's
    (mode & 0o777) === 0o600.
  • Owner can still drive the session: a run or inspect against the
    session still succeeds (proves the tightened perms didn't lock out the owner).

Guard the whole suite to Unix: at the top, if (process.platform === 'win32')
skip (use vitest's describe.skipIf(process.platform === 'win32') or an early
it.skip). Mode bits are not meaningful on Windows.

Verification: npx vitest run test/integration/socket-permissions.test.ts
→ all new cases pass.

Done criteria

ALL must hold:

  • npm run typecheck exits 0.
  • npm run lint exits 0.
  • npm run format:check exits 0.
  • npm run test:integration exits 0; the new socket-permissions test exists
    and passes on this (Unix) machine.
  • grep -n "chmod" src/host/hostMain.ts src/host/rpcServer.ts shows the two
    new chmod calls.
  • grep -n "mode: 0o600" src/storage/manifests.ts shows the manifest mode.
  • No files outside the in-scope list are modified (git status).
  • plans/README.md status row updated.

STOP conditions

Stop and report back (do not improvise) if:

  • The code at the "Current state" locations doesn't match the excerpts (drift).
  • An existing integration or e2e test that connects to the socket fails after
    Step 2 — that would mean the owner is being locked out or a non-owner path
    exists you weren't told about. Do not loosen the mode to make it pass.
  • The socket-directory creation is not at/near hostMain.ts:1077, or sPath
    is constructed differently than described.
  • Setting mode: 0o600 on the temp file changes behavior on rename (e.g. a
    test reads the manifest as a different user) — report rather than reverting to
    default mode.

Maintenance notes

  • A reviewer should confirm the chmod on the socket happens after listen
    resolves (the file doesn't exist before then) and that the directory chmod
    uses chmod, not the mkdir mode option (which umask would mask).
  • If a future change moves the socket out of /tmp/agent-tty/ or makes sockets
    per-session-directory instead of per-Home, revisit the directory chmod.
  • Deferred out of this plan: hardening the parent /tmp/agent-tty/ root mode
    (left at default; per-Home 0o700 already prevents traversal into a Home's
    sockets) and any audit-logging of rejected connections. Not needed for the
    boundary this plan establishes.
002-dependency-audit-gate.md

Plan 002: CI fails on high-severity dependency advisories, and the current ones are cleared

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- mise.toml .github/workflows/ci.yml package.json aube-lock.yaml
If any in-scope file changed since this plan was written, re-run
aube audit (Step 1) and compare against the advisory list below before
proceeding; on a large mismatch, treat it as a STOP condition.

Status

  • Priority: P1
  • Effort: S–M
  • Risk: LOW
  • Depends on: none
  • Category: security / dx
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

This repo has no dependency-advisory gate: mise.toml has no audit task and
.github/workflows/ci.yml never audits. As of this writing, aube audit
reports 7 advisories (3 high, 3 moderate, 1 low) that went unnoticed for
exactly that reason. None is a realistic exploit of the shipped CLI — they sit
in transitive build/tooling and non-attacker-facing runtime paths (e.g.
Playwright's own CDP WebSocket, build tools) — but they are trivial to clear and
should not be invisible. The durable win here is the gate: once CI runs
aube audit --audit-level high, any future high/critical advisory in the
dependency tree fails the build instead of silently shipping.

Current state

The advisories, from running aube audit at the repo root (you will re-run this
in Step 1 to get the live list):

Severity Package Vulnerable range Advisory
high esbuild >=0.17.0 <0.28.1 GHSA-gv7w-rqvm-qjhr (binary-integrity / NPM_CONFIG_REGISTRY RCE)
high vite >=8.0.0 <=8.0.15 GHSA-fx2h-pf6j-xcff (server.fs.deny bypass, Windows)
high ws >=8.0.0 <8.21.0 GHSA-96hv-2xvq-fx4p (memory-exhaustion DoS)
moderate vite/launch-editor >=8.0.0 <=8.0.15 GHSA-v6wh-96g9-6wx3 (NTLMv2 hash disclosure, Windows)
moderate ws >=8.0.0 <8.20.1 GHSA-58qx-3vcg-4xpx (uninitialized memory disclosure)
low esbuild >=0.27.3 <0.28.1 GHSA-g7r4-m6w7-qqqr (dev-server arbitrary file read, Windows)

Installed versions (from aube-lock.yaml): esbuild@0.27.7, vite@8.0.11,
ws@8.20.0, brace-expansion@5.0.5 (the brace-expansion moderate ReDoS,
GHSA-jxxr-4gwj-5jf2, also appears in the full audit). All are transitive
none is listed directly in package.json dependencies/devDependencies.

mise.toml defines tasks as [tasks.<name>] with a run = "...". The
aggregate CI task is:

[tasks.ci]
description = "Run CI checks"
run = "mise run format-check && mise run workflow-lint && mise run lint && mise run typecheck && mise run test && mise run build && mise run install-smoke"

There is no [tasks.audit].

.github/workflows/ci.yml — the linux-static job runs a sequence of
mise run … steps (format-check, workflow-lint, lint, typecheck,
validate-bundles, build, install-smoke). It must stay hand-curated (per
AGENTS.md: "Keep .github/workflows/ci.yml hand-curated").

The audit tooling (verified)

aube audit supports:

  • --audit-level <low|moderate|high|critical> — only fail/print at or above a
    severity (default low).
  • --fix=update — refresh the lockfile to patched versions allowed by existing
    version ranges (no package.json changes).
  • --fix=override — write package.json overrides forcing patched versions.
  • --dev — audit only devDependencies.

aube audit mutates aube-lock.yaml / package.json only when --fix is
passed; a bare aube audit is read-only.

Commands you will need

Purpose Command Expected on success
Audit (read-only) aube audit prints advisories
Audit, high+ only aube audit --audit-level high "0 vulnerabilities" at high+
Fix in-range aube audit --fix=update lockfile updated
Fix via overrides aube audit --fix=override package.json + lockfile updated
Install aube install exit 0
Typecheck npm run typecheck exit 0
Build npm run build exit 0
Unit tests npm run test:unit all pass
Lint workflows mise run workflow-lint exit 0
Run a mise task mise run audit (after Step 3)

Scope

In scope:

  • package.json — only if --fix=override adds an overrides/pnpm.overrides
    block to clear advisories.
  • aube-lock.yaml — regenerated by aube audit --fix / aube install.
  • mise.toml — add [tasks.audit] and reference it from [tasks.ci].
  • .github/workflows/ci.yml — add one audit step to the linux-static job.

Out of scope:

  • Bumping the direct dependency majors (playwright, ink, vitest,
    ghostty-web) to chase a transitive — overrides are the surgical fix. If only
    a direct-major bump can clear a high advisory, that is a STOP condition.
  • CHANGELOG.md — automation-owned (Communique/release-please); never edit it.
  • Any src/ code change. This plan is dependency + CI config only.
  • The macOS CI job (quality-gates-macos) — it intentionally omits release-only
    tooling; do not add the audit step there.

Git workflow

  • Branch: advisor/002-dependency-audit-gate
  • Conventional Commits. Example: ci: gate CI on high-severity dependency advisories.
    If overrides are written, a second commit like
    chore(deps): override ws/vite/esbuild to patched versions is fine.
  • Do NOT push or open a PR unless instructed.

Steps

Step 1: Capture the current advisory baseline

Run aube audit and save the output. Confirm it roughly matches the table
above (versions/advisories may have shifted slightly since planning — that's
fine; work from the live list). Then run aube audit --audit-level high and
note exactly which high advisories are reported — those are the ones the
gate (Step 3) will require to be clear.

Verify: aube audit prints a non-empty advisory list including at least one
high.

Step 2: Clear the advisories

  1. Run aube audit --fix=update (patches reachable within existing ranges).
  2. Re-run aube audit --audit-level high. If high advisories remain, run
    aube audit --fix=override to force the patched versions (this writes an
    overrides block to package.json).
  3. Run aube install to ensure the lockfile is consistent.
  4. Re-run aube audit --audit-level high.

Verify: aube audit --audit-level high reports 0 high (and 0 critical)
vulnerabilities
. (Moderate/low may remain — see Maintenance notes.)

Step 3: Confirm nothing broke

The overrides force newer transitive versions; confirm the toolchain still works:

  • npm run typecheck → exit 0.
  • npm run build → exit 0.
  • npm run test:unit → all pass.

If feasible in this environment, also run npm run test:e2e (it exercises the
ghostty-web/Playwright path that pulls vite/esbuild/ws). If e2e can't run here,
note that in your report.

Verify: typecheck, build, and unit tests all green.

Step 4: Add the audit mise task

In mise.toml, add a task (place it near [tasks.lint]):

[tasks.audit]
description = "Fail on high-severity dependency advisories"
run = "aube audit --audit-level high"

Then add mise run audit to the [tasks.ci] chain — put it right after
mise run lint:

[tasks.ci]
description = "Run CI checks"
run = "mise run format-check && mise run workflow-lint && mise run lint && mise run audit && mise run typecheck && mise run test && mise run build && mise run install-smoke"

Verify: mise run audit → exit 0 (matches Step 2's clean high-level audit).

Step 5: Wire the audit into CI

In .github/workflows/ci.yml, in the linux-static job, add a step after
the existing "Lint" step (run: mise run lint):

- name: Audit dependencies
  run: mise run audit

Keep the file hand-curated (don't regenerate it). Do not touch any other job.

Verify: mise run workflow-lint → exit 0 (actionlint + zizmor accept the
new step).

Test plan

This change is config/dependency only; the "tests" are the audit and build
gates themselves:

  • aube audit --audit-level high → 0 high/critical.
  • mise run audit → exit 0.
  • npm run typecheck && npm run build && npm run test:unit → all green
    (proves the forced transitive versions are compatible).
  • mise run workflow-lint → exit 0 (proves the CI edit is valid).

No new unit test file is required.

Done criteria

ALL must hold:

  • aube audit --audit-level high reports 0 high and 0 critical advisories.
  • grep -n "tasks.audit" mise.toml and grep -n "mise run audit" mise.toml
    both match (task defined and in the ci chain).
  • grep -n "Audit dependencies" .github/workflows/ci.yml matches, under the
    linux-static job.
  • mise run workflow-lint exits 0.
  • npm run typecheck, npm run build, npm run test:unit all exit 0.
  • No src/ files modified; no CHANGELOG.md change (git status).
  • plans/README.md status row updated.

STOP conditions

Stop and report back (do not improvise) if:

  • Clearing a high advisory is impossible via --fix=update / --fix=override
    and would require a major bump of a direct dependency (playwright, ink,
    vitest, ghostty-web) — report the residual advisory and its reachability;
    the maintainer decides whether to gate at critical instead or accept the risk.
  • After overrides, npm run build or npm run test:unit fails and a quick,
    in-range version adjustment doesn't fix it (a forced version is incompatible).
  • aube audit is unavailable in your environment (e.g. aube not installed) —
    do not substitute npm audit (the repo has no package-lock.json; npm audit
    errors with ENOLOCK here). Report instead.
  • The live advisory set is wildly different from the table above (e.g. a new
    critical in a direct dependency) — surface it rather than silently fixing.

Maintenance notes

  • The gate is set at high deliberately: it blocks the genuinely actionable
    advisories without making CI hostage to every low-signal transitive moderate.
    If the team wants moderates gated too, change --audit-level high to
    moderate once the current moderates (brace-expansion ReDoS, ws uninitialized
    memory) are also cleared.
  • --fix=override pins transitive versions in package.json. When the upstream
    direct deps catch up to patched transitives, those overrides can be removed —
    a reviewer should periodically check whether the overrides block is still
    needed (aube audit after deleting it).
  • A reviewer should confirm the audit step landed only in linux-static, not in
    quality-gates-macos (which intentionally installs a reduced toolset).
  • Reachability context for the PR description: these advisories are in
    build/tooling and non-attacker-facing runtime paths; the value is the gate and
    hygiene, not an active-exploit fix. State that honestly.
003-fix-release-contract-version.md

Plan 003: RELEASE.md no longer claims a stale "0.2.x" release line

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- RELEASE.md package.json
If RELEASE.md changed since this plan was written, compare the "Current
state" excerpt against the live file before editing; on a mismatch, treat it
as a STOP condition.

Status

  • Priority: P2
  • Effort: S
  • Risk: LOW
  • Depends on: none
  • Category: docs
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

RELEASE.md is the user-facing support contractREADME.md links to it
("The supported contract is in RELEASE.md"). Its opening still says the
document covers the "current 0.2.x release line" and calls 0.2.0 "the first
stable cut", but the product is at 0.4.3 (package.json) and the project now
releases via release-please, which bumps the version automatically. A reader
checking what's supported sees a version line that is two minor releases stale.
The body of the contract is capability-based and still accurate; only the
version framing in the first two lines is wrong. Making that framing
version-agnostic fixes the drift and prevents it from recurring on the next
release-please bump.

Current state

RELEASE.md:1-7 (the only stale part — the rest of the file is
capability-based and correct):

# agent-tty release contract

This document defines the supported product contract for the current `0.2.x` release line.
The `0.1.x` beta line established the baseline for isolated, reviewable terminal automation for real TUI workflows, and `0.2.0` is the first stable cut on top of that baseline; later `0.2.x` releases may add compatible fixes and features without widening this core support contract.
If a workflow depends on behavior outside this document, treat it as future-scope or best-effort rather than a guaranteed capability.

For per-release changes, see [`CHANGELOG.md`](./CHANGELOG.md). For release mechanics, use [`docs/RELEASE-PROCESS.md`](./docs/RELEASE-PROCESS.md). For reviewer-facing proof bundles, start with [`dogfood/CATALOG.md`](./dogfood/CATALOG.md).

package.json:3 is "version": "0.4.3". The linked files
docs/RELEASE-PROCESS.md, CHANGELOG.md, and dogfood/CATALOG.md all exist
(verified) — do not change those links.

The rest of RELEASE.md (the "Supported capabilities", "Explicitly out of
scope", "Known limitations", "Validation" sections, lines 9-39) is accurate and
must not change — note that line 20 already correctly references the shipped
libghostty-vt semantic renderer.

Conventions to follow

  • Markdown prose; oxfmt formats *.md (see mise.toml format-check sources),
    so run the formatter after editing.
  • Keep the contract capability-based and version-agnostic so release-please
    version bumps don't re-stale it. Do not hardcode 0.4.x (it would drift
    again); describe the contract without pinning a release-line number.

Commands you will need

Purpose Command Expected on success
Format (fix) npm run format exit 0
Format check npm run format:check exit 0

Scope

In scope:

  • RELEASE.md — only lines 3-4 (the version framing).

Out of scope:

  • The capability/limitation/validation sections of RELEASE.md (lines 9-39).
  • The links on line 7 (all targets exist).
  • README.md, CHANGELOG.md (automation-owned), package.json, and any
    release workflow.

Git workflow

  • Branch: advisor/003-fix-release-contract-version
  • Conventional Commits. Example: docs: make the RELEASE.md support contract version-agnostic.
  • Do NOT push or open a PR unless instructed.

Steps

Step 1: Make the opening version-agnostic

Replace lines 3-4 of RELEASE.md with version-agnostic phrasing. Target text
(keep line 5 — "If a workflow depends…" — and everything below unchanged):

This document defines the supported product contract for the current stable release line.
It builds on the `0.1.x` beta baseline for isolated, reviewable terminal automation of real TUI workflows; later stable releases may add compatible fixes and features without widening this core support contract.

(The exact wording can vary, but it must not name a specific 0.2.x/0.x
"current" release line. The first stable baseline reference to 0.1.x is
historically accurate and fine to keep.)

Verify: grep -n "0.2" RELEASE.md → returns nothing (no remaining 0.2.x
/ 0.2.0 references).

Step 2: Format

Run npm run format, then npm run format:check → exit 0.

Verify: npm run format:check → exit 0.

Test plan

No code; the checks are:

  • grep -n "0\.2\.[0-9x]" RELEASE.md → no matches.
  • npm run format:check → exit 0.
  • Manual read: lines 9-39 are unchanged from the current file.

Done criteria

ALL must hold:

  • grep -nE "0\.2\.[0-9x]" RELEASE.md returns no matches.
  • RELEASE.md no longer contains the phrase "first stable cut" tied to a
    version (or any "current 0.x.y release line" claim).
  • npm run format:check exits 0.
  • Only RELEASE.md is modified (git status); no CHANGELOG.md change.
  • plans/README.md status row updated.

STOP conditions

Stop and report back if:

  • RELEASE.md's opening no longer matches the excerpt above (it was already
    edited).
  • Any of the links on line 7 point to a file that no longer exists
    (ls docs/RELEASE-PROCESS.md CHANGELOG.md dogfood/CATALOG.md) — that's a
    separate doc-rot finding; report it, don't fix it here.

Maintenance notes

  • Keeping the contract version-agnostic means future release-please bumps won't
    re-stale this file. If the team later wants an explicit version stamp, the
    durable way is a release-please-managed marker (like the
    <!-- x-release-please-version --> comment used in README.md) rather than
    hand-edited prose — that's a deliberate follow-up, not part of this plan.
004-hostmain-characterization-tests.md

Plan 004: hostMain's pure decision helpers and the idle-timeout path are covered by tests

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- src/host/hostMain.ts test/unit/host/hostMain.test.ts
If src/host/hostMain.ts changed since this plan was written, compare the
"Current state" excerpts against the live code before proceeding; on a
mismatch, treat it as a STOP condition.

Status

  • Priority: P2
  • Effort: M
  • Risk: LOW
  • Depends on: none
  • Category: tests
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

src/host/hostMain.ts (1094 lines) is the per-session orchestration core — it
owns the PTY, event log, renderer polling, RPC dispatch, idle timeout, and
shutdown. Its entire unit test today is 9 lines asserting one exported
constant (test/unit/host/hostMain.test.ts). The happy path is exercised
indirectly by integration/e2e tests (which run the real CLI), but the file's
decision helpers — exit-signal normalization, the commandability predicate
that gates every input/control RPC, and renderer-name resolution with its
env/default fallback — have no targeted tests, and one observable orchestration
branch (idle-timeout auto-exit) has no dedicated coverage. These are exactly the
small, branch-y functions where a regression slips through "the integration
test still passed". Characterizing them now pins the current behavior and makes
later refactors safe.

Current state

src/host/hostMain.ts is one large runHost(sessionId) function with inner
closures, plus a handful of module-level pure helpers near the top. Only
MAX_CONSECUTIVE_POLL_FAILURES is currently exported:

// src/host/hostMain.ts
export const MAX_CONSECUTIVE_POLL_FAILURES = 10; // line 77

function normalizeExitSignal(signal: number | null): string | null {
  // line 87
  invariant(
    signal === null || (Number.isInteger(signal) && signal >= 0),
    'PTY exit signal must be a non-negative integer or null',
  );
  return signal === null || signal === 0 ? null : String(signal);
}

function isSessionCommandable(state: SessionState): boolean {
  // line 96
  return isCommandableSessionStatus(state.snapshot().status);
}

function assertSessionCommandable(state: SessionState): void {
  // line 100
  if (!isSessionCommandable(state)) {
    throw makeCliError(ERROR_CODES.SESSION_NOT_RUNNING, {
      message: 'Session is not running.',
    });
  }
}

function resolveHostRendererName(input: string | undefined): RendererName {
  // line 116
  const rawRenderer =
    input ??
    process.env[HOST_RENDERER_ENV_KEY] ??
    process.env.AGENT_TTY_RENDERER ??
    DEFAULT_RENDERER_NAME;
  try {
    return resolveRendererName(rawRenderer);
  } catch (error) {
    throw makeCliError(ERROR_CODES.INVALID_INPUT, {
      message: 'Renderer must be one of: ghostty-web, libghostty-vt.',
      details: { renderer: rawRenderer },
      cause: error,
    });
  }
}

Relevant imports already in hostMain.ts:

  • SessionState from ./sessionState.js (line 15)
  • isCommandableSessionStatus from ../protocol/sessionStatusPolicy.js (line 20)
    — a pure predicate: isCommandableSessionStatus(status: SessionStatus): boolean
    (src/protocol/sessionStatusPolicy.ts:111). Commandable statuses are the
    running-family per CONTEXT.md ("A running Session is Commandable"; an
    exiting/destroying/terminal Session is not).
  • resolveRendererName, DEFAULT_RENDERER_NAME, RendererName (lines 42-44),
    HOST_RENDERER_ENV_KEY (line 40), ERROR_CODES/makeCliError (line 19).

Conventions to follow

  • Tests use vitest (describe/it/expect). See the existing host tests
    in test/unit/host/ for structure — runCompletionCoordinator.test.ts and
    eventLog.test.ts are substantial, idiomatic examples.
  • To build a SessionState test double for the commandability tests, model on
    test/unit/commands/gc.test.ts
    , which already constructs SessionState
    instances — reuse that exact construction shape rather than inventing one.
  • Asserting a thrown CliError: check the .code against ERROR_CODES (e.g.
    ERROR_CODES.SESSION_NOT_RUNNING, ERROR_CODES.INVALID_INPUT). Look at an
    existing test that asserts on a thrown CliError for the pattern.
  • When a test mutates process.env, save and restore it (beforeEach/
    afterEach) so it doesn't leak into other tests.
  • Strict TS, NodeNext ESM, .js import extensions, import type for types.
  • Exporting an internal helper purely for testing is an accepted pattern here —
    MAX_CONSECUTIVE_POLL_FAILURES is already exported for exactly that reason.

Commands you will need

Purpose Command Expected
Typecheck npm run typecheck exit 0
Lint npm run lint exit 0
Run the new unit test npx vitest run test/unit/host/hostMain.test.ts all pass
Unit suite npm run test:unit all pass
Integration suite npm run test:integration all pass

Scope

In scope:

  • src/host/hostMain.ts — add export to the four pure helpers only
    (normalizeExitSignal, isSessionCommandable, assertSessionCommandable,
    resolveHostRendererName). No logic changes.
  • test/unit/host/hostMain.test.ts — expand with the new unit tests.
  • test/integration/ — one new test for the idle-timeout path (Step 3), or a
    case added to test/integration/lifecycle.test.ts.

Out of scope:

  • Any behavior change in hostMain.ts. This plan only adds export and
    adds tests. If you find yourself changing logic, stop.
  • Refactoring runHost or extracting the inner closures (that is plan 006's
    territory, and not required here).
  • CHANGELOG.md (automation-owned).
  • The protocol schemas / CLI envelopes.

Git workflow

  • Branch: advisor/004-hostmain-characterization-tests
  • Conventional Commits. Example: test: characterize hostMain decision helpers and idle-timeout exit.
  • Do NOT push or open a PR unless instructed.

Steps

Step 1: Export the four pure helpers

In src/host/hostMain.ts, add the export keyword to normalizeExitSignal
(line 87), isSessionCommandable (96), assertSessionCommandable (100), and
resolveHostRendererName (116). Change nothing else.

Verify: npm run typecheck → exit 0. npm run lint → exit 0.

Step 2: Unit-test the helpers

Rewrite test/unit/host/hostMain.test.ts to keep the existing
MAX_CONSECUTIVE_POLL_FAILURES assertion and add describe blocks:

  • normalizeExitSignal:
    • nullnull
    • 0null
    • 9'9', 15'15'
    • a negative or non-integer signal → throws (invariant). Assert it throws.
  • isSessionCommandable / assertSessionCommandable (build SessionState
    per test/unit/commands/gc.test.ts):
    • a running SessionState → isSessionCommandable is true;
      assertSessionCommandable does not throw.
    • a terminal/exited (and an exiting) SessionState → isSessionCommandable
      is false; assertSessionCommandable throws a CliError with code
      ERROR_CODES.SESSION_NOT_RUNNING and message 'Session is not running.'.
  • resolveHostRendererName (save/restore process.env around each case):
    • explicit input 'libghostty-vt' → resolves to that name.
    • input undefined with HOST_RENDERER_ENV_KEY set → resolves from the env var.
    • input undefined, no env → resolves to DEFAULT_RENDERER_NAME.
    • an invalid name (e.g. 'nope') → throws a CliError with code
      ERROR_CODES.INVALID_INPUT.

Verify: npx vitest run test/unit/host/hostMain.test.ts → all pass
(the original constant test plus the new ones).

Step 3: Integration-test the idle-timeout exit branch

create exposes --idle-timeout-ms <ms> (src/cli/main.ts:326). Add a test
(model on test/integration/lifecycle.test.ts, which already drives create/
inspect/destroy against an isolated absolute AGENT_TTY_HOME):

  • Create a session with a small idle timeout (pick a value comfortably above the
    internal idle-check cadence — note IDLE_CHECK_CAP_MS = 5_000 in
    hostMain.ts, so the poll cadence is bounded at 5s; choose a timeout and a
    wait that are robust to that, e.g. a timeout of a few hundred ms and then poll
    inspect until the status is terminal, with a generous overall deadline).
  • Assert the session reaches a terminal status (exited) via inspect --json
    without any further input.
  • Use the same isolated-home setup/teardown as the neighboring tests; never
    touch the real ~/.agent-tty.

If the idle-timeout behavior is not cleanly observable via inspect within a
reasonable, non-flaky wait, stop and report (see STOP conditions) rather
than adding a sleep-and-hope test — the unit tests in Step 2 are the required
core; this integration test is the bonus branch.

Verify: npx vitest run test/integration/<your-file>.test.ts → passes.
Run it a second time to confirm it is not flaky.

Step 4: Full static + suites

npm run lint, npm run typecheck, npm run test:unit, then
npm run test:integration → all green.

Test plan

  • New unit cases (Step 2): normalizeExitSignal (4+ cases incl. throw),
    commandability predicate + assertion (running / exiting / terminal),
    renderer-name resolution (explicit / env / default / invalid-throws).
  • New integration case (Step 3): idle-timeout auto-exit observed via inspect.
  • Structural patterns: unit → model on test/unit/host/runCompletionCoordinator.test.ts
    and test/unit/commands/gc.test.ts (for SessionState); integration → model
    on test/integration/lifecycle.test.ts.
  • Verification: npm run test:unit and npm run test:integration both pass,
    including the new cases.

Done criteria

ALL must hold:

  • grep -nE "^export function (normalizeExitSignal|isSessionCommandable|assertSessionCommandable|resolveHostRendererName)" src/host/hostMain.ts → 4 matches.
  • npx vitest run test/unit/host/hostMain.test.ts passes with the new cases
    (and still asserts MAX_CONSECUTIVE_POLL_FAILURES === 10).
  • npm run test:unit and npm run test:integration exit 0.
  • npm run typecheck and npm run lint exit 0.
  • git diff src/host/hostMain.ts shows only added export keywords (no
    logic change).
  • No CHANGELOG.md change; no files outside scope modified (git status).
  • plans/README.md status row updated.

STOP conditions

Stop and report back if:

  • The four helpers are no longer at the cited lines / no longer match the
    excerpts (drift).
  • Building a SessionState for the commandability tests requires more than the
    shape used in test/unit/commands/gc.test.ts (e.g. a live PTY) — report and
    scope those two cases out rather than constructing a heavy fake.
  • The idle-timeout integration test (Step 3) can only be made to pass with a
    fixed sleep and is flaky on a second run — drop Step 3, keep Steps 1–2, and
    report that Step 3 needs a deterministic hook.
  • Asserting a thrown CliError's .code doesn't work as described (the error
    shape differs) — report the actual shape.

Maintenance notes

  • These are characterization tests: they pin current behavior. If a future
    change intentionally alters, say, commandability rules, the test should be
    updated deliberately in the same change — a failure here on an unrelated PR is
    a real regression signal.
  • The deeper orchestration branches inside runHost (renderer-poll-failure
    recovery, shutdown reconciliation, concurrent-wait handling) remain
    unit-untestable without extracting them from the closure. That extraction is
    deliberately not in this plan; it's a candidate follow-up that would pair
    well with plan 006's refactoring approach.

Plans 005 and 006 follow in a comment below (GitHub PR body size limit).

🤖 Generated with Claude Code

ThomasK33 and others added 6 commits June 16, 2026 16:40
Change-Id: I3a53be8ace00ec7df33a6e995ff82d373001c20f
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ones

Add a [tasks.audit] mise task (aube audit --audit-level high) and wire it
into both the [tasks.ci] chain and the linux-static CI job (after Lint).
This fails the build on any future high/critical dependency advisory.

Clear the 7 current advisories (3 high, 3 moderate, 1 low) by adding
package.json overrides forcing patched transitive versions:
brace-expansion 5.0.6, esbuild 0.28.1, vite 8.0.16, ws 8.21.0. aube audit
now reports 0 vulnerabilities at all severities.

Change-Id: Iff1b8f8043963786d348dd87939dab7b3df865bc
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Plan 001: restrict the per-Home socket directory to 0o700, the bound
socket file to 0o600, and persisted state files (manifests, homes.json)
to 0o600, regardless of umask. Adds an integration test asserting the
permission bits and that the owner can still drive the session.

Plan 004: export hostMain's pure decision helpers (normalizeExitSignal,
isSessionCommandable, assertSessionCommandable, resolveHostRendererName)
and add characterization unit tests plus an idle-timeout auto-exit
integration test. No logic changes to hostMain.

Change-Id: I4d8f8425e631752cf00ce653bd5654f8e86e230e
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Plan 005: extract the replay-event seq/ordering invariants into a single
tested helper (src/renderer/replayEvents.ts) and use it from both renderer
backends' replayTo so they iterate replay events through one shared path.

Plan 006: move the embedded harness HTML and the harness-decoding layer
out of the ghostty-web backend god file into sibling modules
(embeddedHarnessHtml.ts, harnessDecoding.ts), cutting backend.ts from
2798 to ~1620 lines. Pure move-and-reimport; the three externally-consumed
decode symbols are re-exported from backend.ts to keep importers resolving.

Change-Id: Ib38d070fcb1289bc78556b8d1100e5cc9b4578a6
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
001 harden local-state permissions (socket 0o700, state files 0o600)
002 CI dependency-audit gate + clear 7 advisories
003 RELEASE.md support contract version-agnostic
004 characterize hostMain decision helpers + idle-timeout
005 share replay-event iteration across renderer backends
006 extract harness HTML + decoding from ghostty-web backend

Change-Id: Ife225ce75822896f8c7826639dc6d9ed2e00b64f
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Iacec0ce02995b6b64254c68d3c9301a52f7c7f89
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33

Copy link
Copy Markdown
Member Author

The remaining two plan documents, continued from the pull request description:

005-dedupe-replay-event-iteration.md

Plan 005: Both renderer backends iterate replay events through one shared, tested helper

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- src/renderer/ghosttyWeb/backend.ts src/renderer/libghosttyVt/backend.ts
If either backend changed since this plan was written, compare the "Current
state" excerpts against the live code before proceeding; on a mismatch, treat
it as a STOP condition.

Status

  • Priority: P2
  • Effort: M
  • Risk: MED
  • Depends on: none
  • Category: tech-debt
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

The two renderer backends — ghosttyWeb (visual/Playwright) and libghosttyVt
(native semantic) — each re-implement the same replay-event iteration:
validate each event's sequence is a non-negative integer, enforce strictly
increasing order, skip events already applied (seq <= lastAppliedSeq), stop at
the target (seq > targetSeq), then dispatch on event type. This logic is
correctness-critical: per the architecture's tiered-truth model, both backends
must converge on the same visible screen content and Screen Hash for the
same event log. The two copies have already drifted — different assertion
messages, and ghosttyWeb flushes its output batch before breaking while
libghosttyVt just breaks — which is exactly how a subtle divergence creeps in.
Extracting the shared iteration into one tested helper removes the duplication
and gives the seq/ordering invariants a single home.

Current state

Both replayTo methods contain a near-identical loop. The shared part is the
per-event validation + filtering scaffolding; each backend keeps its own
feed strategy (ghosttyWeb batches output and awaits async bridge calls;
libghosttyVt feeds synchronously).

src/renderer/ghosttyWeb/backend.ts:1602-1664 (async, batched output;
note the flush at 1618 before the targetSeq break and the unconditional flush at
1664 after the loop):

for (const event of input.events) {
  assertNonNegativeInteger(
    event.seq,
    'replay event seq must be a non-negative integer',
  );
  invariant(
    event.seq > previousEventSeq,
    'replay events must be ordered by strictly increasing seq values',
  );
  previousEventSeq = event.seq;

  if (event.seq <= this.lastAppliedSeq) {
    continue;
  }
  if (event.seq > input.targetSeq) {
    await flushOutputBatch();
    break;
  }

  switch (event.type) {
    case 'output': {
      pendingOutputChunks.push(event.payload.data);
      break;
    }
    case 'resize': {
      await flushOutputBatch();
      /* assert + resizeBridge + set cols/rows */ break;
    }
    case 'marker': {
      await flushOutputBatch();
      break;
    }
    case 'input_text':
    case 'input_paste':
    case 'input_keys':
    case 'input_run':
    case 'run_complete':
    case 'signal':
    case 'exit': {
      await flushOutputBatch();
      break;
    }
    default: {
      unreachable(event, 'unsupported replay event type');
    }
  }
  highestProcessedSeq = event.seq;
}

await flushOutputBatch(); // line 1664 — flushes any pending output after the loop

src/renderer/libghosttyVt/backend.ts:486-535 (synchronous feed):

for (const event of input.events) {
  assertNonNegativeInteger(event.seq, 'replay event seq must be non-negative');
  invariant(
    event.seq > previousEventSeq,
    'replay events must be ordered by strictly increasing seq values',
  );
  previousEventSeq = event.seq;

  if (event.seq <= this.lastAppliedSeq) {
    continue;
  }
  if (event.seq > input.targetSeq) {
    break;
  }

  switch (event.type) {
    case 'output':
      terminal.feed(event.payload.data);
      break;
    case 'resize':
      /* assert + terminal.resize + set cols/rows */ break;
    case 'marker':
    case 'input_text':
    case 'input_paste':
    case 'input_keys':
    case 'input_run':
    case 'run_complete':
    case 'signal':
    case 'exit':
      break;
    default:
      unreachable(event, 'unsupported replay event type');
  }
  highestProcessedSeq = event.seq;
}

Both surround the loop with: let previousEventSeq = -1; let highestProcessedSeq = this.lastAppliedSeq; before, and after the loop the
if (highestProcessedSeq < 0) { highestProcessedSeq = input.targetSeq; } +
this.lastAppliedSeq = highestProcessedSeq; sequence.

Critical detail — previousEventSeq is updated BEFORE the skip/stop checks
in both, so the strictly-increasing invariant covers every event, including
skipped ones. The shared helper must preserve that exact order.

Types and conventions

  • ReplayInput is exported from src/renderer/types.ts. The event element type
    is ReplayInput['events'][number] — use that indexed type so you don't depend
    on the exact exported name.
  • invariant is in src/util/assert.ts. unreachable(value, message) (same
    module) is used for the exhaustive default — leave each backend's default: unreachable(...) in place; it belongs with the type switch, not the iterator.
  • Strict TS, NodeNext ESM, .js import extensions, import type for types.
  • 2-space indent, single quotes, trailing commas (oxfmt enforces).

Design constraint (honor this)

From design/ARCHITECTURE.md (tiered-truth model): semantic-renderer truth and
reference-visual truth must agree on visible content. The CONTEXT.md Screen
Hash term states the Screen Hash is computed from the same canonical visible
text the stability check and text waits use, "so the three never disagree." Any
change to replay iteration must keep both backends producing identical visible
content for the same events — that is what test/integration/screen-hash.test.ts
guards.

Commands you will need

Purpose Command Expected
Typecheck npm run typecheck exit 0
Lint npm run lint exit 0
Replay unit tests npx vitest run test/unit/host/replay.test.ts test/unit/renderer all pass
New helper test npx vitest run test/unit/renderer/replayEvents.test.ts all pass
Screen-hash conv. npx vitest run test/integration/screen-hash.test.ts all pass
e2e (cross-backend) npm run test:e2e all pass

Scope

In scope:

  • src/renderer/replayEvents.ts (create) — the shared iterator helper.
  • src/renderer/ghosttyWeb/backend.ts — use the helper in replayTo.
  • src/renderer/libghosttyVt/backend.ts — use the helper in replayTo.
  • test/unit/renderer/replayEvents.test.ts (create) — unit-test the helper.

Out of scope (do NOT change behavior here):

  • The output-feed strategy in either backend (ghosttyWeb's batching +
    flushOutputBatch; libghosttyVt's synchronous terminal.feed). Keep every
    existing await flushOutputBatch() call site in ghosttyWeb intact.
  • The highestProcessedSeq tracking and the < 0 → targetSeq fallback and
    this.lastAppliedSeq = … assignment — leave these in each backend unchanged.
  • The pre-loop input validation (assertPositiveInteger on initial dims,
    targetSeq checks, the no-rewind invariant with its backend-specific message).
  • replayWithTiming (ghosttyWeb) — only replayTo is in scope.
  • CHANGELOG.md (automation-owned).

Git workflow

  • Branch: advisor/005-dedupe-replay-event-iteration
  • Conventional Commits. Example: refactor: share replay-event iteration across renderer backends.
  • Do NOT push or open a PR unless instructed.

Steps

Step 1: Create the shared iterator

Create src/renderer/replayEvents.ts:

import type { ReplayInput } from './types.js';
import { invariant } from '../util/assert.js';

type ReplayEvent = ReplayInput['events'][number];

/**
 * Yield the replay events that fall in the half-open range
 * (lastAppliedSeq, targetSeq], in order. Enforces the seq invariants shared by
 * every renderer backend: each event seq is a non-negative integer, seqs are
 * strictly increasing across ALL events (including skipped ones), events at or
 * below lastAppliedSeq are skipped, and iteration stops at the first event
 * beyond targetSeq. Callers dispatch on event.type and own how output is fed.
 */
export function* iterateInRangeReplayEvents(
  input: ReplayInput,
  lastAppliedSeq: number,
): Generator<ReplayEvent> {
  let previousEventSeq = -1;
  for (const event of input.events) {
    invariant(
      Number.isInteger(event.seq) && event.seq >= 0,
      'replay event seq must be a non-negative integer',
    );
    invariant(
      event.seq > previousEventSeq,
      'replay events must be ordered by strictly increasing seq values',
    );
    previousEventSeq = event.seq;

    if (event.seq <= lastAppliedSeq) {
      continue;
    }
    if (event.seq > input.targetSeq) {
      return;
    }
    yield event;
  }
}

Verify: npm run typecheck → exit 0.

Step 2: Use the helper in libghosttyVt (the simpler, synchronous backend first)

In src/renderer/libghosttyVt/backend.ts replayTo, replace the
for (const event of input.events) { …validation…; if skip; if break; switch }
loop with:

for (const event of iterateInRangeReplayEvents(input, this.lastAppliedSeq)) {
  switch (event.type) {
    case 'output':
      terminal.feed(event.payload.data);
      break;
    case 'resize':
      assertPositiveInteger(
        event.payload.cols,
        'resize event cols must be positive',
      );
      assertPositiveInteger(
        event.payload.rows,
        'resize event rows must be positive',
      );
      terminal.resize(event.payload.cols, event.payload.rows);
      this.currentCols = event.payload.cols;
      this.currentRows = event.payload.rows;
      break;
    case 'marker':
    case 'input_text':
    case 'input_paste':
    case 'input_keys':
    case 'input_run':
    case 'run_complete':
    case 'signal':
    case 'exit':
      break;
    default:
      unreachable(event, 'unsupported replay event type');
  }
  highestProcessedSeq = event.seq;
}

Keep let previousEventSeq = -1; removed only if it is now unused (the helper
owns it) — delete the now-dead previousEventSeq declaration and assignment.
Keep highestProcessedSeq and everything after the loop unchanged. Add the
import: import { iterateInRangeReplayEvents } from '../replayEvents.js';.

Verify: npm run typecheck → exit 0;
npx vitest run test/unit/host/replay.test.ts test/unit/renderer → all pass.

Step 3: Use the helper in ghosttyWeb (preserve every flush)

In src/renderer/ghosttyWeb/backend.ts replayTo, replace the loop the same
way, keeping all await flushOutputBatch() calls inside the switch and the
one after the loop (current line 1664)
. The targetSeq break previously flushed
then broke (line 1618); the helper now stops iteration at that boundary and the
post-loop await flushOutputBatch() (1664) flushes any pending output — net
behavior identical. Result:

for (const event of iterateInRangeReplayEvents(input, this.lastAppliedSeq)) {
  switch (event.type) {
    case 'output': {
      pendingOutputChunks.push(event.payload.data);
      break;
    }
    case 'resize': {
      await flushOutputBatch();
      assertPositiveInteger(
        event.payload.cols,
        'resize event cols must be a positive integer',
      );
      assertPositiveInteger(
        event.payload.rows,
        'resize event rows must be a positive integer',
      );
      await this.resizeBridge(page, event.payload.cols, event.payload.rows);
      this.currentCols = event.payload.cols;
      this.currentRows = event.payload.rows;
      break;
    }
    case 'marker': {
      await flushOutputBatch();
      break;
    }
    case 'input_text':
    case 'input_paste':
    case 'input_keys':
    case 'input_run':
    case 'run_complete':
    case 'signal':
    case 'exit': {
      await flushOutputBatch();
      break;
    }
    default: {
      unreachable(event, 'unsupported replay event type');
    }
  }
  highestProcessedSeq = event.seq;
}

await flushOutputBatch();

Delete the now-dead previousEventSeq declaration. Add the import:
import { iterateInRangeReplayEvents } from '../replayEvents.js';. Leave
highestProcessedSeq, the < 0 → targetSeq fallback, snapshot read, and return
unchanged.

Verify: npm run typecheck → exit 0.

Step 4: Cross-backend behavior gates

Run the convergence and visual suites — these are the real safety net:

  • npx vitest run test/integration/screen-hash.test.ts → all pass (both
    backends still produce the same canonical visible text / Screen Hash).
  • npm run test:e2e → all pass (rendered output, screenshots, casts unchanged).
    If e2e cannot run in this environment, say so explicitly in your report.

Verify: screen-hash convergence and e2e both green.

Test plan

  • New unit test test/unit/renderer/replayEvents.test.ts for
    iterateInRangeReplayEvents, covering:
    • all events in range → yields all, in order.
    • events with seq <= lastAppliedSeq → skipped (not yielded).
    • an event with seq > targetSeq → iteration stops there (it and everything
      after are not yielded).
    • out-of-order seq (e.g. [0,2,1]) → throws the strictly-increasing invariant.
    • a negative / non-integer seq → throws the non-negative-integer invariant.
    • the strictly-increasing check fires even when the offending event would be
      skipped (e.g. lastAppliedSeq high, but a later event repeats an earlier seq).
  • Regression coverage is the existing test/unit/host/replay.test.ts,
    test/unit/renderer/*, test/integration/screen-hash.test.ts, and e2e — they
    must stay green with no edits.

Done criteria

ALL must hold:

  • src/renderer/replayEvents.ts exists and exports iterateInRangeReplayEvents.
  • Both backends' replayTo import and use it;
    grep -n "iterateInRangeReplayEvents" src/renderer/ghosttyWeb/backend.ts src/renderer/libghosttyVt/backend.ts → 2+ matches.
  • grep -n "previousEventSeq" src/renderer/ghosttyWeb/backend.ts src/renderer/libghosttyVt/backend.ts → no matches (dead declarations removed).
  • npm run typecheck and npm run lint exit 0.
  • npx vitest run test/unit/renderer/replayEvents.test.ts passes (new cases).
  • npx vitest run test/unit/host/replay.test.ts test/unit/renderer passes.
  • npx vitest run test/integration/screen-hash.test.ts passes.
  • npm run test:e2e passes (or its inability to run here is reported).
  • No CHANGELOG.md change; no out-of-scope files modified (git status).
  • plans/README.md status row updated.

STOP conditions

Stop and report back (do not improvise) if:

  • Either backend's loop no longer matches the "Current state" excerpts (drift).
  • test/integration/screen-hash.test.ts fails after the change — that means the
    backends diverged; do NOT adjust the test to pass. Report the diff.
  • Removing a flush or reordering one in ghosttyWeb seems necessary to make the
    helper fit — it isn't; keep every flush. If you can't preserve them, stop.
  • e2e output (screenshots/casts) changes — that's a behavior regression, not a
    refactor; report it.

Maintenance notes

  • The helper is the single home for replay seq/ordering invariants. Future event
    types must be added to each backend's switch (the exhaustive default: unreachable will flag a missing case at type-check time) — the iterator does
    not need changes for new event types.
  • A reviewer should confirm: (1) previousEventSeq updates before the skip in
    the helper (covers skipped events); (2) ghosttyWeb still flushes after the loop;
    (3) lastAppliedSeq/highestProcessedSeq math is untouched in both backends.
  • Deferred: the pre-loop input validation and the highestProcessedSeq fallback
    are also duplicated but were intentionally left in place to bound this change's
    risk; consolidating them is a possible follow-up once this lands cleanly.
006-split-ghostty-web-backend.md

Plan 006: Extract the harness HTML and harness-decoding layer out of the ghostty-web backend god file

Executor instructions: Follow this plan step by step. Run every
verification command and confirm the expected result before moving to the
next step. If anything in the "STOP conditions" section occurs, stop and
report — do not improvise. When done, update the status row for this plan
in plans/README.md — unless a reviewer dispatched you and told you they
maintain the index.

Drift check (run first): git diff --stat c11e2e2..HEAD -- src/renderer/ghosttyWeb/backend.ts
If backend.ts changed since this plan was written, re-locate the symbols by
name (line numbers below will have shifted) and confirm they still match the
descriptions before proceeding; on a structural mismatch, treat it as a STOP
condition.

Status

  • Priority: P3
  • Effort: L
  • Risk: MED
  • Depends on: none (the existing renderer unit + e2e tests are the safety net)
  • Category: tech-debt
  • Planned at: commit c11e2e2, 2026-06-16

Why this matters

src/renderer/ghosttyWeb/backend.ts is 2814 lines — by far the largest file
in the repo — and mixes five unrelated concerns in one module: a ~790-line
embedded HTML/JS harness string, the harness-snapshot decoding/validation layer,
generic assertion helpers, a local HTTP server, and the renderer class itself.
That makes it hard to navigate, hard to test in isolation, and hard for an agent
to change safely. This plan removes the two safe, high-value chunks — the
embedded harness HTML and the harness-decoding free functions — into sibling
modules, cutting the file roughly in half. It deliberately does not touch the
HTTP-server/bridge class methods (which hold this state and need a more careful
refactor); those are a separate follow-up. No runtime behavior changes: this is a
pure move-and-reimport.

Current state

backend.ts layout (line numbers approximate — locate by symbol name):

  • 48–110: interfaces. Of these, the snapshot interfaces move; the
    server/bridge interfaces stay (see Scope):
    • move: GhosttyHarnessVisibleLine (48), GhosttyHarnessSnapshotCell (53),
      GhosttyHarnessRichLine (63), GhosttyHarnessSnapshot (68).
    • stay (server/bridge concern): GhosttyRequestAsset (79),
      GhosttyServedAsset (84), GhosttyBrowserBridge (88),
      GhosttyBrowserGlobal (97).
  • 111–138: constants (DEFAULT_PAGE_VIEWPORT, content-type strings,
    HARNESS_CONTENT_SECURITY_POLICY, MAX_REPLAY_BATCH_SIZE, RAF_TIMEOUT_MS) —
    stay (server/replay concern).
  • 140–929: const EMBEDDED_HARNESS_HTML = \…`;— the ~790-line embedded harness document. **Move (Step 1).** Its only consumer isloadHarnessHtml(line 1065:return EMBEDDED_HARNESS_HTML;`). Two comments (≈942, ≈972)
    reference it by name; they remain accurate after the move.
  • 931–1058: decoding helpers + generic assertions. Move (Step 2):
    GhosttyDecodedColumn (931, exported), stripTrailingAsciiSpaces (945,
    exported), assembleCanonicalLine (974, exported), assertNonNegativeInteger
    (998), assertPositiveInteger (1008), assertPositiveNumber (1018),
    assertHexColor (1028), normalizeError (1036).
  • 1061–1419: harness loaders/validators. Move (Step 2): loadHarnessHtml
    (1061), validateHarnessLines (1169), validateHarnessSnapshotCells (1217),
    validateHarnessSnapshot (1337).
  • 1421–2814: export class GhosttyWebBackendstays (boot, replayTo,
    snapshot, screenshot, video, dispose, and the HTTP server / bridge methods).

External consumers (must keep working):

  • test/unit/renderer/ghosttyWebDecode.test.ts:4-7 imports
    assembleCanonicalLine, stripTrailingAsciiSpaces, and the type
    GhosttyDecodedColumn from ghosttyWeb/backend.js. After the move,
    backend.ts must re-export these three so this import keeps resolving.
  • src/renderer/libghosttyVt/backend.ts:18 and src/export/webm.ts:17 import
    the GhosttyWebBackend class from backend.js — the class stays, so these
    are unaffected.

Conventions to follow

  • Strict TS, NodeNext ESM, .js import extensions on every relative import,
    import type for type-only imports (oxlint enforces import type).
  • 2-space indent, single quotes, trailing commas, semicolons (oxfmt enforces);
    run the formatter after moves.
  • No circular imports. The new modules are leaves: harnessDecoding.ts
    imports from embeddedHarnessHtml.ts and src/util/*, never from
    backend.ts. backend.ts imports from both new modules.

Commands you will need

Purpose Command Expected
Typecheck npm run typecheck exit 0
Lint npm run lint exit 0
Format (fix) npm run format exit 0
Decode/backend unit tests npx vitest run test/unit/renderer all pass
e2e (visual) npm run test:e2e all pass
Line count wc -l src/renderer/ghosttyWeb/backend.ts ~1500 after

Scope

In scope (create + modify):

  • src/renderer/ghosttyWeb/embeddedHarnessHtml.ts (create) — the HTML constant.
  • src/renderer/ghosttyWeb/harnessDecoding.ts (create) — the snapshot interfaces,
    decode helpers, generic assertion helpers, and validators listed above.
  • src/renderer/ghosttyWeb/backend.ts (modify) — remove the moved symbols, add
    imports, re-export the three externally-consumed names.

Out of scope (do NOT change in this plan):

  • The GhosttyWebBackend class body and all its methods — especially the HTTP
    server / bridge methods (startServer, respondToRequest, buildHarnessUrl,
    isAllowedBrowserRequest, writeBridge, writeBatchBridge, resizeBridge,
    readHarnessSnapshot, etc.). Extracting those is a deliberate follow-up.
  • The server/bridge interfaces and the 111–138 constants (they belong with the
    server methods that stay).
  • Any behavior change. This is a move; logic must be byte-identical.
  • src/renderer/libghosttyVt/backend.ts, src/export/webm.ts (only consume the
    class, which doesn't move).
  • CHANGELOG.md (automation-owned).

Git workflow

  • Branch: advisor/006-split-ghostty-web-backend
  • Conventional Commits. Example: refactor: split harness HTML and decoding out of the ghostty-web backend.
    One commit per step is fine.
  • Do NOT push or open a PR unless instructed.

Steps

Step 1: Extract the embedded harness HTML

  1. Create src/renderer/ghosttyWeb/embeddedHarnessHtml.ts containing the moved
    constant:

    export const EMBEDDED_HARNESS_HTML = `<!doctype html>
    …(the entire current value, verbatim)…`;
  2. Delete the const EMBEDDED_HARNESS_HTML = …; block (lines ~140–929) from
    backend.ts.

  3. In backend.ts, add import { EMBEDDED_HARNESS_HTML } from './embeddedHarnessHtml.js';
    (with the other relative imports). loadHarnessHtml keeps using the name
    unchanged.

Verify: npm run typecheck → exit 0. wc -l src/renderer/ghosttyWeb/backend.ts
→ roughly 2020 lines (down ~790). npx vitest run test/unit/renderer → all pass.

Step 2: Extract the harness-decoding layer

  1. Create src/renderer/ghosttyWeb/harnessDecoding.ts. Move into it, verbatim,
    these symbols from backend.ts:
    • Interfaces: GhosttyHarnessVisibleLine, GhosttyHarnessSnapshotCell,
      GhosttyHarnessRichLine, GhosttyHarnessSnapshot.
    • GhosttyDecodedColumn, stripTrailingAsciiSpaces, assembleCanonicalLine.
    • assertNonNegativeInteger, assertPositiveInteger, assertPositiveNumber,
      assertHexColor, normalizeError.
    • loadHarnessHtml, validateHarnessLines, validateHarnessSnapshotCells,
      validateHarnessSnapshot.
    • Keep the existing export keyword on whatever was already exported; export
      everything backend.ts will need to import back.
  2. Add the new module's imports at its top: EMBEDDED_HARNESS_HTML from
    ./embeddedHarnessHtml.js, and invariant/unreachable (whichever are used)
    from ../../util/assert.js. Let npm run typecheck tell you exactly which
    util symbols and types are needed.
  3. In backend.ts, delete the moved blocks and add a single import from
    ./harnessDecoding.js for every moved symbol the class still references
    (the validators, the assertion helpers used in replayTo/screenshot, etc.).
    Run npm run typecheck and add/remove imports until it is clean.

Verify: npm run typecheck → exit 0; npx vitest run test/unit/renderer
→ all pass.

Step 3: Re-export the externally-consumed names and tidy

  1. In backend.ts, add a re-export so the existing decode test keeps resolving:

    export {
      assembleCanonicalLine,
      stripTrailingAsciiSpaces,
    } from './harnessDecoding.js';
    export type { GhosttyDecodedColumn } from './harnessDecoding.js';

    (Do not edit test/unit/renderer/ghosttyWebDecode.test.ts — the re-export is
    what keeps its from '…/backend.js' import valid.)

  2. Run npm run format, then npm run lint → both exit 0.

Verify: npx vitest run test/unit/renderer/ghosttyWebDecode.test.ts
→ all pass (proves the re-export works).

Step 4: Full behavior gate

  • npm run typecheck → exit 0.
  • npm run lint → exit 0.
  • npm run test:unit → all pass.
  • npm run test:e2e → all pass (this exercises the real ghostty-web rendering /
    screenshot path end-to-end; it is the proof the move changed no behavior). If
    e2e cannot run in this environment, say so explicitly.

Test plan

This is a refactor with no new behavior, so the test plan is regression:

  • test/unit/renderer/ghosttyWebDecode.test.ts (decode helpers) — must pass
    unchanged via the re-export.
  • test/unit/renderer/ghosttyWebBackend.test.ts, canonicalScreen.test.ts, and
    the rest of test/unit/renderer/ — must pass unchanged.
  • npm run test:e2e — must pass unchanged (visual/screenshot parity).
  • No new test files are required. If you find a moved helper had zero
    coverage and you want to add a focused unit test for it in
    test/unit/renderer/, that's welcome but optional.

Done criteria

ALL must hold:

  • src/renderer/ghosttyWeb/embeddedHarnessHtml.ts and
    src/renderer/ghosttyWeb/harnessDecoding.ts exist.
  • grep -n "EMBEDDED_HARNESS_HTML = " src/renderer/ghosttyWeb/backend.ts
    → no match (constant moved out).
  • wc -l src/renderer/ghosttyWeb/backend.ts → roughly 1500 lines (down from 2814).
  • npm run typecheck, npm run lint, npm run format:check all exit 0.
  • npm run test:unit exits 0, including test/unit/renderer/ghosttyWebDecode.test.ts
    (proves the re-export).
  • npm run test:e2e passes (or its inability to run here is reported).
  • git diff shows only moves/imports/re-exports — no logic edits inside any
    moved function, no change to the GhosttyWebBackend class methods.
  • No CHANGELOG.md change; no out-of-scope files modified (git status).
  • plans/README.md status row updated.

STOP conditions

Stop and report back (do not improvise) if:

  • Moving a symbol creates a circular import that typecheck flags
    (harnessDecoding.ts must never import from backend.ts). If a moved function
    genuinely depends on something only the class has, leave that function in
    backend.ts and report it.
  • Any test/unit/renderer/* or e2e test fails after a move — that means a move
    was not behavior-preserving (likely a missed import or an accidental edit).
    Do not change the test; find the move error or report it.
  • The GhosttyWebBackend class needs edits beyond import lines to compile — that
    signals you've moved something that should have stayed; report it.
  • backend.ts does not end up substantially smaller (e.g. still > 1800 lines) —
    re-check that both Step 1 and Step 2 actually removed their blocks.

Maintenance notes

  • Deferred to a follow-up plan: extracting the HTTP server + browser-bridge
    methods (startServer, respondToRequest, asset serving, buildHarnessUrl,
    isAllowedBrowserRequest, the *Bridge methods) into a server.ts / a bridge
    helper. Those touch this (the server, serverOrigin, page fields), so
    they need dependency extraction or a small server class — higher risk, separate
    change. This plan intentionally stops before that.
  • A reviewer should confirm the diff is move-only: no function body changed, and
    the re-exports preserve the public import surface (ghosttyWebDecode.test.ts
    and any other importer of the three decode symbols still resolve).
  • The two in-code comments that mention EMBEDDED_HARNESS_HTML (the canonical-line
    helpers note they must stay in sync with the harness copy) remain correct and
    should be left as-is.

@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit 5389ff3 Jun 16, 2026
13 checks passed
@ThomasK33 ThomasK33 deleted the advisor/implement-all-plans branch June 16, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant