Skip to content

Land the M1 stack on main (stacked PRs #2–#5 merged into their bases)#8

Merged
ryandmonk merged 9 commits into
mainfrom
feat/demo-e2e
Jul 2, 2026
Merged

Land the M1 stack on main (stacked PRs #2–#5 merged into their bases)#8
ryandmonk merged 9 commits into
mainfrom
feat/demo-e2e

Conversation

@ryandmonk

Copy link
Copy Markdown
Contributor

Mechanical landing PR — all content already reviewed and merged as #2#5.

What happened (same failure mode as aestheticfunction/dspack-to-a2ui#7): the stacked PRs were merged into their stacked base branches, not main — #2feat/context-compiler, #3feat/surface-gates, #4feat/adapters, #5feat/pipeline. Only #1 and #6 actually reached main, so main currently has the PR-3 compiler plus docs/evidence and none of: the linter (S1–S3), adapters, orchestrator/audit report, serve, or the Playwright gate.

Additionally, #5 was merged at 814d425, just before the serve-hardening + pin-bump commit (a9dc0a9, the Copilot review response with all five threads resolved and both CI jobs green) landed on its head branch — so that commit is stranded too.

This PR's head is feat/demo-e2e: the complete linear stack including the hardening commit and the b47a2cf dep pin (= merged dspack-to-a2ui main). Content is identical to what was reviewed across #2#5 plus the already-reviewed review-response commit. The #6 docs on main are untouched by this branch (no conflicts).

Suggested going forward: merge stacked PRs bottom-up with branch deletion enabled (GitHub then auto-retargets children to main), or retarget each child to main before merging.

🤖 Generated with Claude Code

ryandmonk and others added 8 commits July 2, 2026 15:20
…luators (M1 PR-4)

- src/core/lint/: gate runner (S1 generic surface schema, S2 contract
  vocabulary, S3 governance) — independently reported, never implicit in
  generation (the S0 spike found Ollama's mlx engine silently ignoring
  `format`, which is why S2 is a check on the artifact).
- rule-type registry + evaluators per spec/dspack-v0.3.md §5.3 semantics.
  Findings carry both severity faces (requirement: must|should,
  level: error|warn); rationales verbatim; locations as $.root… paths.
  Unknown rule types throw UnknownRuleTypeError (CLI exit 4) — never skip.
- DEVIATION FROM THE M1 DIRECTIVE, flagged for review: forbidden-composition
  is implemented now (not M2/PR-8). Forced by a conflict discovered in
  implementation: the v0.3 shadcn contract carries a UNIVERSAL
  forbidden-composition rule (rule.button-no-interactive-descendants) and
  spec §5.4 forbids skipping unimplemented types — a two-evaluator linter
  would exit 4 on every lint of the real contract. Fixture F5 activates with
  it (all five golden fixtures active).
- CLI `lint`: JSON report on stdout (golden-comparable), human rendering on
  stderr; exits 0 clean / 2 any S-gate error / 4 unknown rule type.
- fixtures/golden/violating/F1-F5 + clean golden + checked-in expected
  reports; core-boundary test now walks recursively (lint/ included), ajv
  allowed as the only non-node bare import.

Verify: npm test; npx tsx src/cli.ts lint --dspack
fixtures/shadcn.v0_3.dspack.json --surface
fixtures/golden/violating/F1-dialog-for-delete.dsurface.json (exit 2, stdout
equals F1-dialog-for-delete.expected.json)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…9 review fix)

Semantic alignment with the now-normative spec v0.3 §5: S2 and rule
resolution work by sub-component id alone, so duplicate ids across
components fail loudly — naming the id and every declaring component —
before any id-dependent check, instead of resolving by object iteration
order. Same error shape as the dspack validate harness; covered by a new
lint test. No golden outputs change (the shadcn contract has no duplicates).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… PR-5)

- src/adapters/: stateless GenerationAdapter interface per ADR-9 (one attempt
  per call; the repair loop owns conversation state). Model identity is
  configuration: constructors require an explicit model id, no default model
  name exists in code — enforced by a source-scan test.
- OllamaAdapter: /api/chat with format = the generation schema. Non-JSON
  output raises AdapterOutputError (the S0 spike found the mlx engine
  silently ignoring format; gates S1/S2 judge conformance over the artifact).
- AnthropicAdapter: official SDK, output_config.format json_schema (the
  generation schema is compatible by construction: depth-unrolled, closed
  objects). No sampling params sent (removed on current models). refusal and
  max_tokens stop reasons surface as typed errors — never silently retried.
- Offline deterministic tests via injected fetch fixtures: parsed results,
  schema round-trip verbatim into the request body, typed failures.
- scripts/smoke-ollama.ts (live, non-CI): one real generation through the
  compiled context + S1-S3 lint. First live runs recorded in the spike
  addendum: the 8B model passes S1/S2 but fails S3 on every attempt
  (nested-interactive violations) even with rule steering in the prompt —
  live confirmation that the guarantee is the linter, not the prompt.

Verify: npm test (44 tests, offline); npm run smoke:ollama -- --model <tag>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… PR-6)

- src/run/orchestrator.ts: generate → surface gates S1-S3 → bounded repair
  (default 2; system prompt immutable across attempts — the only delta is the
  model's own output + rendered repair feedback, snapshot per attempt) →
  emit via the pinned @aestheticfunction/dspack-to-a2ui git dependency →
  emitter gates A1-A3 (both A2UI versions) → audit report v1. Every outcome
  is a first-class artifact: passed / failed-lint-exhausted (exit 2) /
  failed-gate (exit 3) / failed-adapter (exit 1; added to the plan's enum —
  the S0 spike showed runtimes can fail to constrain, and that must be a
  reported outcome, never a silent retry).
- src/repair/render.ts (ADR-7): one findings object, two serializations —
  the repair message is rendered deterministically from the same findings
  embedded in the report, with linked examples verbatim as corrected
  references. Golden-file tested.
- src/audit/: report v1 + schemas/audit-report.v1.schema.json + markdown
  rendering + docs/AUDIT.md (additive-only guarantee, stable enums,
  reproducibility fields: contract sha256, schema sha256, adapter id,
  per-attempt model + provider meta).
- src/adapters/fake.ts: ScriptedAdapter — the deterministic instrument for
  CI, the demo's verification mode, and eval goldens.
- CLI `run` writes audit-report.json/.md + generated.surface.json; exit
  codes per the README table.

Live verification (recorded in out/ locally, not committed): qwen3:8b
violates S3 on attempt 1 (5 findings), repairs to 1, exhausts honestly
(failed-lint-exhausted); gpt-oss 20B passes S1-S3 + A1-A3 on attempt 1.

Verify: npm test (52 tests, offline/deterministic); npx tsx src/cli.ts run
--dspack fixtures/shadcn.v0_3.dspack.json --intent destructive-action
--prompt "a screen to delete my account" --model ollama:<tag> --out out

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…R-7)

- src/serve.ts: `dspack-gen serve` — localhost-only node:http endpoint
  (incidental plumbing, no framework). POST /run streams PipelineEvent
  NDJSON (start with applicable rule ids / attempt with S1-S3 gates +
  findings / repair message verbatim / emitted A1-A3 / done with the full
  audit report v1). fake:true runs the deterministic ScriptedAdapter
  (golden violating fixture F1 → the contract's worked example); live mode
  requires an explicit model reference — no default model in code.
- orchestrator: observational onEvent hook (the report stays the artifact).
- e2e/flagship.spec.ts (`npm run demo:e2e`): drives the demo app's Generate
  view against serve in fake mode and asserts the entire flagship trail —
  violation with verbatim rationale, exact repair message, clean attempt
  with rule.alertdialog-requires-cancel listed as VERIFIED, A1-A3 green for
  both A2UI versions, the AlertDialog rendering + opening with
  cancel-before-confirm (and closing on Cancel), and the downloaded audit
  report validating against schemas/audit-report.v1.schema.json.
  DEMO_DIR points at a dspack-to-a2ui checkout with the Generate view
  (CI: sibling checkout).

Verify: npm test; DEMO_DIR=<dspack-to-a2ui checkout> npm run demo:e2e

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ged main

Review fixes (Copilot on #5):
- CORS restricted to the demo dev-server origins (localhost/127.0.0.1:5173,
  reflected with vary: origin) with allow-methods on preflight — no more
  wildcard readable by arbitrary sites while the server runs.
- Request bodies capped at 64KB (413 + destroy) — no unbounded buffering.
- Fake mode selects the worked example BY INTENT and fails fast (400) when
  the contract has none — never scripts `undefined`.
- onEvent is observational by enforcement: a throwing hook (e.g. stream
  write after client disconnect) is swallowed and can never change a run's
  outcome. Covered by a pipeline test.
- --port validated (integer 1-65535) with a clear usage error.
- New serve.test.ts: CORS allow/deny, 413, fake-no-example 400, full NDJSON
  event sequence (59 tests total).

Also bumps the pinned @aestheticfunction/dspack-to-a2ui to b47a2cf (merged
main incl. the #6 emitter review fixes and #7 Generate-view landing) — the
single pin bump confirming the dep matches merged main, as directed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 2, 2026 21:34

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mechanically lands the full “M1 stack” onto main (previously merged into stacked base branches), bringing in the surface-gates linter (S1–S3), generation adapters, the pipeline orchestrator + audit report artifact, the local serve endpoint, and the Playwright flagship demo gate.

Changes:

  • Adds surface gates S1–S3 (schema + vocabulary + rule evaluators) with golden fixtures and a dspack-gen lint CLI.
  • Introduces generation adapters (Ollama/Anthropic + deterministic scripted adapter) and the end-to-end pipeline orchestrator that produces audit report v1.
  • Adds dspack-gen serve plus a Playwright e2e gate and workflow wiring for CI.

Reviewed changes

Copilot reviewed 44 out of 46 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vitest.config.ts Configures Vitest to run unit/gate tests under src/**.
src/serve.ts Adds localhost-only NDJSON streaming pipeline endpoint with basic hardening.
src/serve.test.ts Tests CORS restrictions, body size cap, fake-mode behavior, and NDJSON sequence.
src/run/pipeline.test.ts Deterministic acceptance tests for orchestrator success + failure paths and report validity.
src/run/orchestrator.ts Implements generate→lint→repair→emit→validate pipeline and report construction.
src/repair/render.ts Renders deterministic repair feedback from lint findings (ADR-7).
src/index.ts Expands public root exports to include adapters/orchestrator/audit/repair.
src/core/surface-schema.ts Vendors the surface v0.1 JSON schema used by gate S1.
src/core/lint/walk.ts Adds surface tree traversal helpers shared by S2/S3.
src/core/lint/vocabulary.ts Implements gate S2 (contract vocabulary validation).
src/core/lint/rules.ts Implements gate S3 rule registry + evaluators and UnknownRuleTypeError.
src/core/lint/lint.test.ts Adds golden-based acceptance tests for S1–S3 behavior and independence.
src/core/lint/index.ts Wires S1–S3 linting with AJV and summarization.
src/core/lint/findings.ts Defines Finding/LintReport shapes and deterministic text rendering.
src/core/index.ts Exposes linting APIs from the ./core subpath.
src/core/core-boundary.test.ts Updates core-boundary scan to include recursive module traversal + allowed imports.
src/core/contract.ts Adds duplicate sub-component id detection helper used by S2.
src/cli.ts Adds lint, run, and serve commands; writes audit artifacts for run.
src/audit/report.ts Adds audit report v1 types, hashing utilities, and markdown rendering.
src/adapters/types.ts Defines adapter interface, typed adapter errors, and model-ref parsing.
src/adapters/ollama.ts Implements Ollama adapter using structured outputs (format).
src/adapters/index.ts Exports adapters and provides adapterFor(modelRef) factory.
src/adapters/fake.ts Adds deterministic scripted adapter for tests/demo/e2e.
src/adapters/anthropic.ts Implements Anthropic adapter using SDK structured outputs.
src/adapters/adapters.test.ts Offline deterministic adapter tests with injected fetch fixtures.
scripts/smoke-ollama.ts Adds non-CI live Ollama smoke script (generation + S1–S3).
schemas/audit-report.v1.schema.json Introduces JSON schema for audit report v1 artifact.
playwright.config.ts Adds Playwright config to run demo e2e against serve and sibling demo repo.
package.json Adds pinned emitter dependency, Anthropic SDK, and Playwright scripts/deps.
package-lock.json Locks new dependencies including git-pinned emitter and Playwright.
fixtures/golden/violating/F5-nested-interactive.expected.json Adds golden expected report for F5 violating fixture.
fixtures/golden/violating/F5-nested-interactive.dsurface.json Adds F5 violating surface fixture.
fixtures/golden/violating/F4-missing-title.expected.json Adds golden expected report for F4 violating fixture.
fixtures/golden/violating/F4-missing-title.dsurface.json Adds F4 violating surface fixture.
fixtures/golden/violating/F3-missing-cancel.expected.json Adds golden expected report for F3 violating fixture.
fixtures/golden/violating/F3-missing-cancel.dsurface.json Adds F3 violating surface fixture.
fixtures/golden/violating/F2-no-confirmation.expected.json Adds golden expected report for F2 violating fixture.
fixtures/golden/violating/F2-no-confirmation.dsurface.json Adds F2 violating surface fixture.
fixtures/golden/violating/F1-dialog-for-delete.expected.json Adds golden expected report for F1 violating fixture.
fixtures/golden/violating/F1-dialog-for-delete.dsurface.json Adds F1 violating surface fixture.
fixtures/golden/repair/F1.repair.txt Adds golden repair-message text for F1.
fixtures/golden/clean/delete-account.dsurface.json Adds clean golden surface fixture (worked example).
e2e/flagship.spec.ts Adds Playwright flagship end-to-end test validating full UI trail + report schema.
docs/AUDIT.md Documents audit report versioning and stability guarantees.
.gitignore Updates ignores for out dirs and Playwright artifacts.
.github/workflows/test.yml Adds CLI lint gate and a Playwright demo-e2e job that checks out sibling demo repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/serve.ts
Comment thread src/cli.ts
Comment thread src/core/core-boundary.test.ts
Comment thread src/core/lint/rules.ts
Comment thread src/serve.ts Outdated
Comment thread src/serve.ts
…s, spec-conformant F5 location

Review fixes (Copilot on #8):
- serve: body cap now counts BYTES (Buffer chunk lengths), not UTF-16 code
  units; `fake`/`noSteering` are strict booleans (a JSON "false" string no
  longer flips behavior or mis-records generation.ruleSteering); maxRepairs
  validated as a non-negative integer (400).
- CLI: --max-repairs validated (NaN/negative previously reached the
  orchestrator and could produce a zero-attempt report).
- core-boundary test splits on both path separators (cross-platform).
- forbidden-composition: the SPEC (v0.3 §5.3, merged) locates the finding at
  the offending descendant — the evaluator did the opposite of its own
  comment and the spec. Now spec-conformant: located at the offender, the
  message names the matching origin node. F5 golden regenerated; the
  historical audit-report evidence in docs/evidence/ is untouched (it
  documents runs under the prior shape, as recorded).

59 tests + flagship e2e green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ryandmonk ryandmonk merged commit 691566a into main Jul 2, 2026
2 checks passed
@ryandmonk ryandmonk deleted the feat/demo-e2e branch July 5, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants