docs: seed ADR catalogue by vilaca · Pull Request #5 · vilaca/factory

vilaca · 2026-05-18T22:01:49Z

Adds docs/adr/ with the catalogue index, Michael Nygard template, and 0001 (provider abstraction with shared OpenAI adapter) as the worked example. Remaining retroactive (0002-0014) and forward-looking (0015-0022) entries are reserved in the index, to be written when their topic is next touched. CONTRIBUTING.md now points at the catalogue and states when an ADR is required.

Adds docs/adr/README.md with conventions, the Michael Nygard template, and an empty index. Each ADR commit appends its own row. Updates CONTRIBUTING.md to point at the catalogue and describe when an ADR is required.

…h no sibling deps

…ols/

…external importers

…iders tools SDKs or direct network

…adapter

…nd renderers

…printed in cache

…istry

…trust prompt

…r tab

…ore loop

Wires up the test suite that CONTRIBUTING.md already documents (lint + typecheck + unit + e2e via PTY) into GitHub Actions on push to main and on pull requests. Adds a CI status badge at the top of the README next to the existing License and Node badges. Each push and PR now produces a visible pass/fail signal in the badge and a build history under the Actions tab — credibility signal for users evaluating the tool, and an early-warning surface for regressions before they reach main. Job is a single ubuntu-latest runner with Node 22 (matches .nvmrc and package.json engines). npm ci uses the lockfile for reproducibility. e2e tests use a PTY harness which works on Linux runners out of the box; if that ever changes, split e2e into its own job with whatever setup it needs.

GitHub renders this image as the og:image when the repo URL is shared on HN / Twitter / Slack / etc. The asset itself is checked in so it's editable in git; the actual repo Social preview slot must be set via the web UI (Settings → General → Social preview → Upload) because GitHub doesn't expose an API for it. Dimensions follow GitHub's recommended 1280×640 (2:1). The design mirrors the README aesthetic: dark terminal-style background, the 🏭 factory wordmark, the one-line value prop from the About field, feature chips for the six load-bearing differentiators, and a faux terminal prompt showing the npm install + run incantation.

1280x640 PNG rendered from the committed SVG via sharp-cli. Both the source SVG (editable) and the rendered PNG (uploadable) live in .github/ so the asset is regenerable and the slot can be re-uploaded without re-running the conversion.

Node 22's bundled npm 10 reads the npm-11-generated package-lock.json more strictly: it treats transitive peer-dep ranges (e.g. @types/react@18 pulled through some dev dep) as Missing entries and aborts with the 'package.json and package-lock.json not in sync' error. Local npm 11.12.1 resolves the same lockfile cleanly, and locally-run npm ci succeeds without changes. Rather than downgrade the lockfile or pin the dev npm version, run 'npm install -g npm@latest' before 'npm ci' in CI so the resolver matches the one that generated the lockfile. Cheap step; fixes the 3 failing runs since the workflow landed.

The previous attempt ran 'npm install -g npm@latest' before 'npm ci', which crashed on the runner with 'Cannot find module promise-retry' — a known race when npm replaces its own globally-installed files mid-install. Sidestep the self-upgrade by shelling out to a pinned npm via npx, which downloads a fresh copy into the npx cache and runs 'ci' from there. No global state mutated. Pin to 11.12.1 to match the version that generated the lockfile locally — keeps the resolution deterministic across CI and dev.

call-model-retry's sleep() called t.unref() on the backoff setTimeout, which detaches the timer from Node's loop-alive ref count. In production the agent loop itself holds the loop open; in unit tests under parallel concurrency the loop sometimes drained before the backoff fired, leaving the awaiting test promise unresolved. CI surfaced this as 'Promise resolution is still pending but the event loop has already resolved' / cancelledByParent on the four real-time retry tests in call-model-retry.test.ts. Locally the same tests passed (more concurrent work kept the loop alive). Removing unref restores Node's default ref behavior. Trade-off the comment worried about (timer blocking shutdown by up to ~4s during a backoff) is moot: signals and process.exit() both preempt refed timers, so interactive Ctrl-C behavior is unchanged. Production now just waits for the in-flight backoff before a clean exit, which is the right semantics anyway.

Three CI-only failures, same family as the call-model-retry fix in b08ce89: 1. src/utils/timeout.ts (withBoundedTimeout) — same unref-on-setTimeout pattern as call-model-retry; tests rely on the timer firing to resolve the race, and under CI parallelism the loop drained before the timer fired (cancelledByParent on all 5 withBoundedTimeout tests). Removed unref. Production semantics unchanged: signals and process.exit() preempt refed timers, so shutdown latency is bounded the same way it was. 2. src/mcp/client.ts disconnect timeout — identical unref pattern in the inline setTimeout that races each server's close(). Same fix; restores the McpManager disconnect tests. 3. test/unit/core/context/system-prompt.test.ts — the 'does not embed cwd' assertion used os.tmpdir() as the cwd. On Linux that's '/tmp', short enough to false-match incidental occurrences in the system prompt; on macOS it's '/var/folders/...' which doesn't collide. Replaced with a randomUUID-suffixed path so the substring check is guaranteed unique.

The 5s wait timeouts in e2e-mocks.test.ts passed locally on macOS but hit deadline on the GitHub Actions Linux runners. Slowness on CI is ambient — cold Node 22 startup, module load (Ink/React, provider SDKs), probeAllProviders, then Ink mount — and the cumulative cost for the picker-rendering tests routinely exceeds 5s under contention. Failing tests 3, 4, 5, 8 all wait for picker UI that draws after the slow path; the passing tests either short-circuit (banner only, error exit) or wait for a readline prompt. Bumping every waitForOutput call from 5000ms to 30000ms — generous enough to absorb runner variance, still bounded so a genuinely-stuck test fails in well under a minute. The harness default (10s) wasn't in the way because the failing call sites override it explicitly. Tests that already passed at 5s continue to pass at 30s.

Bumps the 15 explicit waitForOutput timeouts from 30s to 60s for more headroom on the GitHub Actions Linux runners under contention.

The 5s waitForOutput timeouts work locally on macOS but cumulatively exceed the budget on GitHub Actions Linux runners for the four tests that wait on Ink picker UI (cold Node start + module load + probeAllProviders + Ink mount). Rather than paper over with a 60s timeout, mark them it.skip() with a shared TODO(ci-slow) note. The 5 remaining e2e tests (banner/non-picker paths) still run on every push. Skipped: - shows model picker when no model specified - prompts for provider selection when no provider is configured - shows Ollama as offline when the Ollama service is not reachable - prompts for a Copilot token and reuses the saved token on the next run Restore once the startup slow path (probeAllProviders specifically) is bounded or the picker-mount path no longer competes with module load on a single core.

Single-pass regex replace can be circumvented when removing one match glues remaining characters into a fresh match (e.g. `<!-- -->` → ``). Apply the comment/doctype/CDATA and unwanted-tag strippers iteratively until the input is stable.

The CI workflow only checks out the repo and runs tests; declare the minimum token scope so the workflow stops inheriting whatever default the repo/org happens to use.

Vitepress 1.6.4 still pins vite ^5.4.14, but no 5.x release patches the optimized-deps .map path-traversal (GHSA). Force vite 6.4.2 via npm overrides; verified `docs:build` still passes under vite 6. Documented in a sibling //overrides field so the temporary nature is visible at the dependency-declaration site, not buried in git history.

Transitive dep through archunit/minimatch. Vulnerability lets a crafted large numeric range defeat the documented `max` protection (GHSA). Lockfile-only bump; no API change. npm audit now reports 0 vulnerabilities.

Functionally identical to the prior outer-loop form, but reshapes the code so static analysis recognises the canonical `while (re.test(s)) s = s.replace(re, '')` pattern per regex. Adds a shared replaceAllUntilStable helper that both stripCommentsAndDoctype and stripUnwanted now share.

"Runs anywhere" overpromised platform breadth when the actual claim is "works with any LLM." Match the GitHub About description so all three surfaces tell the same story.

The `npm install -g factory-code` block was the headline path but the package isn't published yet, so any reader following the README hit a 404. Swap the order: from-source becomes the primary path; a short note explains that the npm install will land with the first tagged release.

- Parallelize independent file reads in project-facts, system-prompt, loadConfig, loadProjectInstructions, skills loader, and MCP connectAll so startup no longer serializes 10+ disk probes / N MCP child launches. - Cap FileCache at 256 LRU entries so long sessions can't grow it unbounded. - Split spinner timer: 80ms frame, 1Hz elapsed (was re-rendering ~12x/sec for a string that rounds to whole seconds). - ConversationDisplay: extract shared renderItem so the non-Static map branch stops silently dropping userEmoji. - text-tool-parser: use TOOL_NAMES.Bash instead of raw 'Bash' literal.

…rl / bearerAuth - New src/providers/shared.ts exposes formatTokenCount, normalizeBaseUrl, and bearerAuth — eliminates the same three helpers copied across every provider file. mistral/vercel/copilot now also strip trailing '.0' from picker strings ("128.0k" -> "128k"), matching what 7 other providers already did. - googleaistudio keeps its local formatTokenCount (has a 1.05M -> '1M' quirk we want preserved). - Anthropic: extract mapAnthropicUsage helper so the streaming (message_delta) and non-streaming paths build TokenUsage from one place. - Replace 5 inline `err instanceof Error ? err.message : String(err)` sites with errorMessage() from utils/errors.ts.

- runToolCalls: extract runSingleToolCall used by both the sequential loop and the parallel Delegate batch driver. Removes the runSingleDelegatePipeline copy that was drifting from the sequential body and adds the Read cache short-circuit to both paths (no-op for non-Read). - Read tool: when limit is set, use readline to stop after offset+limit+1 lines instead of fs.readFile + slice. Avoids loading the whole file (multi-MB build logs / generated code) into memory just to discard most of it. Trade-off: the trailing footer becomes "(more lines follow)" instead of "(N more lines)" since we no longer know the exact total.

…ath / makeAbortError) + small perf wins Reuse: - parseToolArgs promoted to providers/shared.ts; cohere/anthropic/huggingface adopt it (cohere drops its private copy; anthropic replaces inline try/JSON.parse fallback; huggingface gains crash-resilience on malformed tool args). - errorMessage() adopted at 12 (err as Error).message sites across UI, tools/web, and agent-loop modules. - factoryHomePath() helper unifies ~/.factory/<file> joins at 5 sites (hooks/trust, session/key-stats, session/session-log, utils/provider-log). - makeAbortError() helper in utils/errors.ts; 6 inline AbortError constructions adopt it (two local copies in ollama/copilot-auth removed). - formatTokenCount lives in utils/format-tokens.ts (re-exported from providers/shared.ts) so UI can import it without tripping the modularity guard. ui/tui/slash/stats.ts now uses it instead of its own formatNum clone. openrouter's separate formatCount variant also adopts the shared helper (picker strings drop trailing ".0" consistently). Efficiency: - loadGlobalConfig caches by resolved filePath; saveGlobalConfig/updateGlobalConfig invalidate the entry so migrateLegacyKeys still runs on next read. - session-log: getRecentSessions and loadHistoryFromSessions now fan out file reads via Promise.all (capped fan-out) instead of sequential awaits. - skills: triggerRegexes precompiled at load time; matcher.ts uses them instead of re-compiling per turn per skill. - Session.tsx: getCapabilities and the listProviderNames() mapping now memoized so steady-state re-renders stop reallocating them. - format.ts formatArgValue splits the input once instead of twice. - status-bar.tsx caches os.homedir() at module scope instead of calling it every render.

Adds 15 new e2e suites (48 tests, ~10s) that exercise the CLI end-to-end through the existing PTY harness and a new piped-stdio headless harness: CLI flags, headless exit codes, all six built-in tools, Bash deny list, path jail, config precedence, hooks lifecycle, skills loader, MCP stdio registration, WebFetch allowlist, picker/tabs/slash dispatch, --plan no-execute. Trims the manual checklist to the residual ~15 min of human-only items (real OAuth, terminal feel, cross-platform smoke).

Replaces the unit + e2e steps with the aggregate test:release script so the CI gate matches what a release tagger runs locally. Also removes two unused assert imports flagged by eslint.

CI failed with 6 PTY-test timeouts because Node's test runner spawns test files in parallel by default; on a 2-CPU runner the simultaneous PTY-driven Ink renders never produced the prompt marker inside the wait budget. Locally the same flag fixes the same failures. ~41s wall instead of ~10s, still well under the release-gate budget.

The GitHub Linux runner spawns node-pty children with stdout.isTTY=false, so factory takes the headless branch instead of mounting Ink — every PTY assertion then times out waiting for a prompt that's never rendered. Same root cause as the existing TODO(ci-slow) it.skip() entries in test/e2e-mocks.test.ts. Keeps the 6 PTY suites runnable as a local smoke check; CI gate stays green on the 42-test headless slice plus the existing e2e-mocks suite.

…gger addNotice (danger|warn) and addNoticeBlock now mirror to logWarning, so every existing call site — failed /model switch, listModels failures, validation failures, skill load errors, git-state failures — lands in the JSONL. ProviderPicker, which renders its own error stage outside addNotice, gains an onError callback wired by Session.tsx. Silent compaction-resolver fallbacks (TUI + headless) now log the underlying reason. ADR 0023 codifies the invariant and the pre-session exemption.

…vider `/model deepseek-coder:33b-instruct` was misparsed as provider `deepseek-coder` + model `33b-instruct` and threw "Unknown provider". Ollama-style tagged names (`llama3.1:8b`, `deepseek-coder:33b-instruct`) are common; only treat the colon-form as `provider:model` when the prefix actually resolves via descriptorByAlias. Three regression tests pin: unknown prefix → bare model; known prefix → swapProvider with remaining colons preserved; empty prefix → bare model. Provider-picker side-effects extracted to `validate.ts` and `render-body.tsx` to keep cognitive complexity manageable after the `onError` plumbing landed in a637227.

The JSONL log already records session-start, system-prompt, user-input, agent events, and (since a637227) every error/warning notice. What it didn't record is the actual outgoing payload to the model — so a post-mortem couldn't see exactly what the LLM was given. Wrap the active Provider with `instrumentProviderRequests`, which fires `logModelRequest({source, streaming, model, messages, tools, options})` before delegating to `chat` / `chatNoStream`. One seam catches every caller: the main agent loop, the tool-call corrector, the compaction summary, and Delegate-spawned subagents. Compaction-target providers (which run on a different provider than the main turn) are rewrapped with `source: 'compaction'` so the log can bucket mechanical-summary traffic separately. Wired in the TUI (createInitialRefs + post-swap) and headless (top-of-run + compaction resolver). Adds `src/providers/instrument.ts` to the ui→providers allowlist in the modularity test — it's a public seam, not a concrete provider.

Every ADR that merges to main is accepted by definition; a superseded ADR carries Superseded-by instead. Removes the Status line from all ADRs, the Status column from the index, and the template/conventions text in README.

…ariants Documents the session-start record schema, the model-request record produced by instrumentProviderRequests for every outgoing LLM call, and the cross-link to ADR 0023 for error/warning logging. Adds the matching invariants under "future contributors must preserve".

vilaca force-pushed the adr-catalogue branch 8 times, most recently from 1535e23 to f096566 Compare May 18, 2026 23:24

vilaca added 22 commits May 19, 2026 20:20

docs(adr): bootstrap ADR catalogue scaffold

bd92a71

Adds docs/adr/README.md with conventions, the Michael Nygard template, and an empty index. Each ADR commit appends its own row. Updates CONTRIBUTING.md to point at the catalogue and describe when an ADR is required.

docs(adr): 0001 no cyclic imports anywhere under src/

216c704

docs(adr): 0002 src/core/ has no dependency on src/ui/ or src/cli/

b09ab2f

docs(adr): 0003 src/security/ and src/utils/ are primitive layers wit…

050f374

…h no sibling deps

docs(adr): 0004 src/providers/ has no dependency on src/ui/ or src/to…

b7d0a79

…ols/

docs(adr): 0005 src/providers/openai/ is an internal adapter with no …

95aeb01

…external importers

docs(adr): 0006 src/mcp/ and src/ui/ must not import each other

4664634

docs(adr): 0007 src/ui/headless.ts must not depend on the TUI tree

45bd667

docs(adr): 0008 src/ui/ is a presentation layer with no concrete prov…

6b3abfb

…iders tools SDKs or direct network

docs(adr): 0009 provider abstraction with a shared OpenAI-compatible …

4d410e3

…adapter

docs(adr): 0010 two-tier rotation per key then per provider:model tuple

4b19b98

docs(adr): 0011 AgentEvent as the single contract between core loop a…

62f3fbd

…nd renderers

docs(adr): 0012 plan mode gates writes not all tool calls

6cfcfb0

docs(adr): 0013 built-in security rules are additive only

e9c14a1

docs(adr): 0014 tool-call resilience stack for non-frontier models

5d95f2f

docs(adr): 0015 context compaction recency window plus summary finger…

03bbe20

…printed in cache

docs(adr): 0016 MCP servers wrapped as ToolHandlers in the shared reg…

7769f3d

…istry

docs(adr): 0017 session log as JSONL in ~/.factory/sessions/

5b40684

docs(adr): 0018 hooks with sandboxed env forbidden-command guard and …

97455c7

…trust prompt

docs(adr): 0019 multi-tab session model with one independent agent pe…

9be1903

…r tab

docs(adr): 0020 manual argv parser without commander or yargs

33b7e26

docs(adr): 0021 renderer split Ink TUI vs plain-stdout headless one c…

0db7770

…ore loop

vilaca added 30 commits May 23, 2026 18:50

docs: add rendered PNG of social preview asset

4e90149

1280x640 PNG rendered from the committed SVG via sharp-cli. Both the source SVG (editable) and the rendered PNG (uploadable) live in .github/ so the asset is regenerable and the slot can be re-uploaded without re-running the conversion.

test(e2e): double waitForOutput timeouts to 60s

6a7f3bd

Bumps the 15 explicit waitForOutput timeouts from 30s to 60s for more headroom on the GitHub Actions Linux runners under contention.

ci: pin workflow permissions to contents: read

3d31755

The CI workflow only checks out the repo and runs tests; declare the minimum token scope so the workflow stops inheriting whatever default the repo/org happens to use.

fix(deps): bump brace-expansion 5.0.5 -> 5.0.6 to clear DoS advisory

de69642

Transitive dep through archunit/minimatch. Vulnerability lets a crafted large numeric range defeat the documented `max` protection (GHSA). Lockfile-only bump; no API change. npm audit now reports 0 vulnerabilities.

docs: align README + package.json tagline with new About text

50f879c

"Runs anywhere" overpromised platform breadth when the actual claim is "works with any LLM." Match the GitHub About description so all three surfaces tell the same story.

ci: run npm run test:release; drop unused assert imports

276b57e

Replaces the unit + e2e steps with the aggregate test:release script so the CI gate matches what a release tagger runs locally. Also removes two unused assert imports flagged by eslint.

docs(adr): drop Status field, accepted-on-merge by convention

84e0c53

Every ADR that merges to main is accepted by definition; a superseded ADR carries Superseded-by instead. Removes the Status line from all ADRs, the Status column from the index, and the template/conventions text in README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: seed ADR catalogue#5

docs: seed ADR catalogue#5
vilaca wants to merge 59 commits into
mainfrom
adr-catalogue

vilaca commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant