Land the M1 stack on main (stacked PRs #2–#5 merged into their bases)#8
Merged
Conversation
…luators (M1 PR-4) - src/core/lint/: gate runner (S1 generic surface schema, S2 contract vocabulary, S3 governance) — independently reported, never implicit in generation (the S0 spike found Ollama's mlx engine silently ignoring `format`, which is why S2 is a check on the artifact). - rule-type registry + evaluators per spec/dspack-v0.3.md §5.3 semantics. Findings carry both severity faces (requirement: must|should, level: error|warn); rationales verbatim; locations as $.root… paths. Unknown rule types throw UnknownRuleTypeError (CLI exit 4) — never skip. - DEVIATION FROM THE M1 DIRECTIVE, flagged for review: forbidden-composition is implemented now (not M2/PR-8). Forced by a conflict discovered in implementation: the v0.3 shadcn contract carries a UNIVERSAL forbidden-composition rule (rule.button-no-interactive-descendants) and spec §5.4 forbids skipping unimplemented types — a two-evaluator linter would exit 4 on every lint of the real contract. Fixture F5 activates with it (all five golden fixtures active). - CLI `lint`: JSON report on stdout (golden-comparable), human rendering on stderr; exits 0 clean / 2 any S-gate error / 4 unknown rule type. - fixtures/golden/violating/F1-F5 + clean golden + checked-in expected reports; core-boundary test now walks recursively (lint/ included), ajv allowed as the only non-node bare import. Verify: npm test; npx tsx src/cli.ts lint --dspack fixtures/shadcn.v0_3.dspack.json --surface fixtures/golden/violating/F1-dialog-for-delete.dsurface.json (exit 2, stdout equals F1-dialog-for-delete.expected.json) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…9 review fix) Semantic alignment with the now-normative spec v0.3 §5: S2 and rule resolution work by sub-component id alone, so duplicate ids across components fail loudly — naming the id and every declaring component — before any id-dependent check, instead of resolving by object iteration order. Same error shape as the dspack validate harness; covered by a new lint test. No golden outputs change (the shadcn contract has no duplicates). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… PR-5) - src/adapters/: stateless GenerationAdapter interface per ADR-9 (one attempt per call; the repair loop owns conversation state). Model identity is configuration: constructors require an explicit model id, no default model name exists in code — enforced by a source-scan test. - OllamaAdapter: /api/chat with format = the generation schema. Non-JSON output raises AdapterOutputError (the S0 spike found the mlx engine silently ignoring format; gates S1/S2 judge conformance over the artifact). - AnthropicAdapter: official SDK, output_config.format json_schema (the generation schema is compatible by construction: depth-unrolled, closed objects). No sampling params sent (removed on current models). refusal and max_tokens stop reasons surface as typed errors — never silently retried. - Offline deterministic tests via injected fetch fixtures: parsed results, schema round-trip verbatim into the request body, typed failures. - scripts/smoke-ollama.ts (live, non-CI): one real generation through the compiled context + S1-S3 lint. First live runs recorded in the spike addendum: the 8B model passes S1/S2 but fails S3 on every attempt (nested-interactive violations) even with rule steering in the prompt — live confirmation that the guarantee is the linter, not the prompt. Verify: npm test (44 tests, offline); npm run smoke:ollama -- --model <tag> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… PR-6) - src/run/orchestrator.ts: generate → surface gates S1-S3 → bounded repair (default 2; system prompt immutable across attempts — the only delta is the model's own output + rendered repair feedback, snapshot per attempt) → emit via the pinned @aestheticfunction/dspack-to-a2ui git dependency → emitter gates A1-A3 (both A2UI versions) → audit report v1. Every outcome is a first-class artifact: passed / failed-lint-exhausted (exit 2) / failed-gate (exit 3) / failed-adapter (exit 1; added to the plan's enum — the S0 spike showed runtimes can fail to constrain, and that must be a reported outcome, never a silent retry). - src/repair/render.ts (ADR-7): one findings object, two serializations — the repair message is rendered deterministically from the same findings embedded in the report, with linked examples verbatim as corrected references. Golden-file tested. - src/audit/: report v1 + schemas/audit-report.v1.schema.json + markdown rendering + docs/AUDIT.md (additive-only guarantee, stable enums, reproducibility fields: contract sha256, schema sha256, adapter id, per-attempt model + provider meta). - src/adapters/fake.ts: ScriptedAdapter — the deterministic instrument for CI, the demo's verification mode, and eval goldens. - CLI `run` writes audit-report.json/.md + generated.surface.json; exit codes per the README table. Live verification (recorded in out/ locally, not committed): qwen3:8b violates S3 on attempt 1 (5 findings), repairs to 1, exhausts honestly (failed-lint-exhausted); gpt-oss 20B passes S1-S3 + A1-A3 on attempt 1. Verify: npm test (52 tests, offline/deterministic); npx tsx src/cli.ts run --dspack fixtures/shadcn.v0_3.dspack.json --intent destructive-action --prompt "a screen to delete my account" --model ollama:<tag> --out out Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…R-7) - src/serve.ts: `dspack-gen serve` — localhost-only node:http endpoint (incidental plumbing, no framework). POST /run streams PipelineEvent NDJSON (start with applicable rule ids / attempt with S1-S3 gates + findings / repair message verbatim / emitted A1-A3 / done with the full audit report v1). fake:true runs the deterministic ScriptedAdapter (golden violating fixture F1 → the contract's worked example); live mode requires an explicit model reference — no default model in code. - orchestrator: observational onEvent hook (the report stays the artifact). - e2e/flagship.spec.ts (`npm run demo:e2e`): drives the demo app's Generate view against serve in fake mode and asserts the entire flagship trail — violation with verbatim rationale, exact repair message, clean attempt with rule.alertdialog-requires-cancel listed as VERIFIED, A1-A3 green for both A2UI versions, the AlertDialog rendering + opening with cancel-before-confirm (and closing on Cancel), and the downloaded audit report validating against schemas/audit-report.v1.schema.json. DEMO_DIR points at a dspack-to-a2ui checkout with the Generate view (CI: sibling checkout). Verify: npm test; DEMO_DIR=<dspack-to-a2ui checkout> npm run demo:e2e Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ged main Review fixes (Copilot on #5): - CORS restricted to the demo dev-server origins (localhost/127.0.0.1:5173, reflected with vary: origin) with allow-methods on preflight — no more wildcard readable by arbitrary sites while the server runs. - Request bodies capped at 64KB (413 + destroy) — no unbounded buffering. - Fake mode selects the worked example BY INTENT and fails fast (400) when the contract has none — never scripts `undefined`. - onEvent is observational by enforcement: a throwing hook (e.g. stream write after client disconnect) is swallowed and can never change a run's outcome. Covered by a pipeline test. - --port validated (integer 1-65535) with a clear usage error. - New serve.test.ts: CORS allow/deny, 413, fake-no-example 400, full NDJSON event sequence (59 tests total). Also bumps the pinned @aestheticfunction/dspack-to-a2ui to b47a2cf (merged main incl. the #6 emitter review fixes and #7 Generate-view landing) — the single pin bump confirming the dep matches merged main, as directed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR mechanically lands the full “M1 stack” onto main (previously merged into stacked base branches), bringing in the surface-gates linter (S1–S3), generation adapters, the pipeline orchestrator + audit report artifact, the local serve endpoint, and the Playwright flagship demo gate.
Changes:
- Adds surface gates S1–S3 (schema + vocabulary + rule evaluators) with golden fixtures and a
dspack-gen lintCLI. - Introduces generation adapters (Ollama/Anthropic + deterministic scripted adapter) and the end-to-end pipeline orchestrator that produces audit report v1.
- Adds
dspack-gen serveplus a Playwright e2e gate and workflow wiring for CI.
Reviewed changes
Copilot reviewed 44 out of 46 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vitest.config.ts | Configures Vitest to run unit/gate tests under src/**. |
| src/serve.ts | Adds localhost-only NDJSON streaming pipeline endpoint with basic hardening. |
| src/serve.test.ts | Tests CORS restrictions, body size cap, fake-mode behavior, and NDJSON sequence. |
| src/run/pipeline.test.ts | Deterministic acceptance tests for orchestrator success + failure paths and report validity. |
| src/run/orchestrator.ts | Implements generate→lint→repair→emit→validate pipeline and report construction. |
| src/repair/render.ts | Renders deterministic repair feedback from lint findings (ADR-7). |
| src/index.ts | Expands public root exports to include adapters/orchestrator/audit/repair. |
| src/core/surface-schema.ts | Vendors the surface v0.1 JSON schema used by gate S1. |
| src/core/lint/walk.ts | Adds surface tree traversal helpers shared by S2/S3. |
| src/core/lint/vocabulary.ts | Implements gate S2 (contract vocabulary validation). |
| src/core/lint/rules.ts | Implements gate S3 rule registry + evaluators and UnknownRuleTypeError. |
| src/core/lint/lint.test.ts | Adds golden-based acceptance tests for S1–S3 behavior and independence. |
| src/core/lint/index.ts | Wires S1–S3 linting with AJV and summarization. |
| src/core/lint/findings.ts | Defines Finding/LintReport shapes and deterministic text rendering. |
| src/core/index.ts | Exposes linting APIs from the ./core subpath. |
| src/core/core-boundary.test.ts | Updates core-boundary scan to include recursive module traversal + allowed imports. |
| src/core/contract.ts | Adds duplicate sub-component id detection helper used by S2. |
| src/cli.ts | Adds lint, run, and serve commands; writes audit artifacts for run. |
| src/audit/report.ts | Adds audit report v1 types, hashing utilities, and markdown rendering. |
| src/adapters/types.ts | Defines adapter interface, typed adapter errors, and model-ref parsing. |
| src/adapters/ollama.ts | Implements Ollama adapter using structured outputs (format). |
| src/adapters/index.ts | Exports adapters and provides adapterFor(modelRef) factory. |
| src/adapters/fake.ts | Adds deterministic scripted adapter for tests/demo/e2e. |
| src/adapters/anthropic.ts | Implements Anthropic adapter using SDK structured outputs. |
| src/adapters/adapters.test.ts | Offline deterministic adapter tests with injected fetch fixtures. |
| scripts/smoke-ollama.ts | Adds non-CI live Ollama smoke script (generation + S1–S3). |
| schemas/audit-report.v1.schema.json | Introduces JSON schema for audit report v1 artifact. |
| playwright.config.ts | Adds Playwright config to run demo e2e against serve and sibling demo repo. |
| package.json | Adds pinned emitter dependency, Anthropic SDK, and Playwright scripts/deps. |
| package-lock.json | Locks new dependencies including git-pinned emitter and Playwright. |
| fixtures/golden/violating/F5-nested-interactive.expected.json | Adds golden expected report for F5 violating fixture. |
| fixtures/golden/violating/F5-nested-interactive.dsurface.json | Adds F5 violating surface fixture. |
| fixtures/golden/violating/F4-missing-title.expected.json | Adds golden expected report for F4 violating fixture. |
| fixtures/golden/violating/F4-missing-title.dsurface.json | Adds F4 violating surface fixture. |
| fixtures/golden/violating/F3-missing-cancel.expected.json | Adds golden expected report for F3 violating fixture. |
| fixtures/golden/violating/F3-missing-cancel.dsurface.json | Adds F3 violating surface fixture. |
| fixtures/golden/violating/F2-no-confirmation.expected.json | Adds golden expected report for F2 violating fixture. |
| fixtures/golden/violating/F2-no-confirmation.dsurface.json | Adds F2 violating surface fixture. |
| fixtures/golden/violating/F1-dialog-for-delete.expected.json | Adds golden expected report for F1 violating fixture. |
| fixtures/golden/violating/F1-dialog-for-delete.dsurface.json | Adds F1 violating surface fixture. |
| fixtures/golden/repair/F1.repair.txt | Adds golden repair-message text for F1. |
| fixtures/golden/clean/delete-account.dsurface.json | Adds clean golden surface fixture (worked example). |
| e2e/flagship.spec.ts | Adds Playwright flagship end-to-end test validating full UI trail + report schema. |
| docs/AUDIT.md | Documents audit report versioning and stability guarantees. |
| .gitignore | Updates ignores for out dirs and Playwright artifacts. |
| .github/workflows/test.yml | Adds CLI lint gate and a Playwright demo-e2e job that checks out sibling demo repo. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…s, spec-conformant F5 location Review fixes (Copilot on #8): - serve: body cap now counts BYTES (Buffer chunk lengths), not UTF-16 code units; `fake`/`noSteering` are strict booleans (a JSON "false" string no longer flips behavior or mis-records generation.ruleSteering); maxRepairs validated as a non-negative integer (400). - CLI: --max-repairs validated (NaN/negative previously reached the orchestrator and could produce a zero-attempt report). - core-boundary test splits on both path separators (cross-platform). - forbidden-composition: the SPEC (v0.3 §5.3, merged) locates the finding at the offending descendant — the evaluator did the opposite of its own comment and the spec. Now spec-conformant: located at the offender, the message names the matching origin node. F5 golden regenerated; the historical audit-report evidence in docs/evidence/ is untouched (it documents runs under the prior shape, as recorded). 59 tests + flagship e2e green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mechanical landing PR — all content already reviewed and merged as #2–#5.
What happened (same failure mode as aestheticfunction/dspack-to-a2ui#7): the stacked PRs were merged into their stacked base branches, not main — #2→
feat/context-compiler, #3→feat/surface-gates, #4→feat/adapters, #5→feat/pipeline. Only #1 and #6 actually reached main, so main currently has the PR-3 compiler plus docs/evidence and none of: the linter (S1–S3), adapters, orchestrator/audit report, serve, or the Playwright gate.Additionally, #5 was merged at
814d425, just before the serve-hardening + pin-bump commit (a9dc0a9, the Copilot review response with all five threads resolved and both CI jobs green) landed on its head branch — so that commit is stranded too.This PR's head is
feat/demo-e2e: the complete linear stack including the hardening commit and theb47a2cfdep pin (= merged dspack-to-a2ui main). Content is identical to what was reviewed across #2–#5 plus the already-reviewed review-response commit. The #6 docs on main are untouched by this branch (no conflicts).Suggested going forward: merge stacked PRs bottom-up with branch deletion enabled (GitHub then auto-retargets children to main), or retarget each child to main before merging.
🤖 Generated with Claude Code