Skip to content

feat: v0.4 evaluators — required-props + category-based forbidden-composition (PR-13)#23

Merged
ryandmonk merged 3 commits into
mainfrom
feat/v0.4-evaluators
Jul 3, 2026
Merged

feat: v0.4 evaluators — required-props + category-based forbidden-composition (PR-13)#23
ryandmonk merged 3 commits into
mainfrom
feat/v0.4-evaluators

Conversation

@ryandmonk

Copy link
Copy Markdown
Contributor

What

M3 plan Phase 1, PR-13 — the dspack-gen side of v0.4 (dspack#11): evaluators for the two new governance forms, violating fixtures with goldens, and the vendored-contract sync.

The seam held

The rule-type registry took required-props as one additive entry — no existing evaluator changed, except forbidden-composition gaining the spec'd optional forbiddenCategories field (v0.4 §4.2; existing fields untouched). Repair feedback needed zero changes: ADR-7's findings serialization is generic over types, now locked by a new F6a repair golden.

Hand-review surface

  • src/core/lint/rules.tsevaluateRequiredProps (direct-text/direct-prop checks; within scoping with the existence clause — every scope node must contain a label-bearer, finding at the scope node) and the forbiddenCategories block (membership resolved via categoryIndex at lint time; message names concrete id + category per spec).
  • Golden diffs (the review artifacts):
    • fixtures/golden/context/shadcn.destructive-action.json — rules 4–5 now render in steering; category steering resolves member ids ("alert-dialog, dialog, dropdown-menu") so the model sees concrete vocabulary.
    • fixtures/golden/eval/results.fake.jsoncontract sha/version lines only; every scripted fixture stays clean under the v0.4 rules (no metric moved).
    • New: F6a/F6b/F7 .expected.json + F6a.repair.txt.

Fixtures (the measured shapes, now lintable)

Fixture Shape Expected
F6a trigger > button > badge{text} — the 78/78 gate-failure signature 1 finding, rule.trigger-carries-label, AT the button
F6b button with no text (the take-2 shape) 1 finding, same rule
F7 dropdown-menu inside alert-dialog-content 1 finding, rule.alertdialog-no-nested-overlays, names id + category
(unit) trigger with no button at all existence-clause finding AT the trigger

Contract sync + rename

check-sync --write pulled the merged v0.4 contract (2.1.0); the fixture renamed shadcn.v0_3.dspack.jsonshadcn.v0_4.dspack.json (the name encodes the spec version) with all matrix configs, tests, the CLI default, and the check-sync manifest updated. Evidence directories untouched — their embedded matrix paths are historical records. This PR turns dspack-gen's drift check green again after dspack#11 (the planned Phase 1 sequence); ds-mcp/dspack-emit syncs are PR-14a/14b.

Acceptance

  • npx vitest run — 84/84 (incl. 4 new F6/F7 spot-checks, existence-clause unit, exit-4 unknown-type preserved)
  • node scripts/check-sync.mjs — in sync against merged dspack main
  • npm run demo:e2e — full flagship flow green against the v0.4 contract (local run required killing two 21-hour stale dev-servers from a previous session; noted for hygiene)
  • npm run build — clean

🤖 Generated with Claude Code

…position (PR-13)

M3 plan Phase 1, downstream of dspack v0.4 (dspack#11):

- required-props evaluator (spec v0.4 §4.1): direct-text / direct-prop checks,
  optional 'within' scope with the existence clause (every scope node must
  contain a label-bearer). Registered additively — the registry seam worked
  as designed; existing evaluators untouched except forbidden-composition,
  which gains the spec'd OPTIONAL forbiddenCategories field (§4.2), resolved
  through the contract's category index at lint time; findings name both the
  concrete offending id and the matched category.
- compiler steering for both new forms (category steering resolves member
  ids so the model sees concrete vocabulary); repair feedback needed ZERO
  changes (ADR-7's generic findings serialization held) — locked by a new
  F6a repair golden.
- fixtures: F6a (label nested — the 78/78 signature shape), F6b (label
  absent — the take-2 shape), F7 (nested overlay by category), each with
  expected-report goldens; within-existence covered by a unit test.
- vendored contract synced to v0.4 (2.1.0) and renamed
  fixtures/shadcn.v0_3.dspack.json -> shadcn.v0_4.dspack.json; all matrix
  configs, tests, cli default, and the check-sync manifest updated
  (evidence dirs untouched — historical records).
- goldens regenerated (context: rules 4–5 appear in steering; fake eval:
  contract sha/version only — scripted fixtures stay clean under v0.4).

84 tests, build, check:sync, demo-e2e all green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 3, 2026 19:43
The rename sweep missed three non-src references (.github/workflows/test.yml
×2, README ×2, scripts/smoke-ollama.ts); CI's 'Lint CLI gate' step failed on
the stale path while all 84 tests passed. F1 lint gate verified locally
(exit 2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds dspack v0.4 governance support on the dspack-gen side by introducing the new required-props rule evaluator and extending forbidden-composition with optional category-based matching, then updates fixtures/goldens and all contract references to the v0.4 shadcn contract.

Changes:

  • Implement required-props evaluation in the S3 rule registry and add contract support for categories + forbiddenCategories.
  • Extend compiler steering text to render required-props and category-expanded forbidden-composition.
  • Update shadcn contract fixture to v0.4 (2.1.0) and refresh/add violating + repair goldens, plus bump all test/matrix references to the new fixture.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/serve.test.ts Updates server test to load the v0.4 shadcn contract fixture.
src/run/pipeline.test.ts Updates pipeline test contract fixture reference to v0.4.
src/repair/render.test.ts Adds a repair-message golden test for the new required-props finding type.
src/eval/eval.test.ts Updates eval CI-gate test contract fixture reference to v0.4.
src/core/lint/rules.ts Adds required-props evaluator and extends forbidden-composition with forbiddenCategories using contract category membership.
src/core/lint/lint.test.ts Updates lint golden coverage to include F6a/F6b/F7 and v0.4 contract fixture.
src/core/contract.ts Extends minimal contract types for v0.4 categories + new rule type, and adds categoryIndex().
src/core/compiler.ts Updates steering rendering to include required-props and category-expanded forbidden composition.
src/core/compiler.test.ts Updates compiler test to use the v0.4 contract fixture.
src/cli.ts Updates default CLI contract path to the v0.4 fixture.
src/adapters/adapters.test.ts Updates adapter tests to use the v0.4 contract fixture.
scripts/check-sync.mjs Updates the sync manifest to track the v0.4 fixture.
fixtures/shadcn.v0_4.dspack.json Vendors the v0.4 shadcn contract (2.1.0) including categories + new governance rules.
fixtures/golden/violating/F7-nested-overlay.expected.json Adds expected S3 output for category-based forbidden-composition violation.
fixtures/golden/violating/F7-nested-overlay.dsurface.json Adds violating surface fixture for nested overlay category violation.
fixtures/golden/violating/F6b-trigger-label-missing.expected.json Adds expected S3 output for required-props (missing direct text) violation variant.
fixtures/golden/violating/F6b-trigger-label-missing.dsurface.json Adds violating surface fixture for required-props label-missing variant.
fixtures/golden/violating/F6a-trigger-label-nested.expected.json Adds expected S3 output for required-props (label nested) violation variant.
fixtures/golden/violating/F6a-trigger-label-nested.dsurface.json Adds violating surface fixture for required-props label-nested variant.
fixtures/golden/repair/F6a.repair.txt Adds repair prompt golden for a required-props finding.
fixtures/golden/eval/results.fake.json Updates eval results golden for new contract sha/dspack version.
fixtures/golden/context/shadcn.destructive-action.json Updates context golden to include the new v0.4 governance rules and category-expanded steering.
eval/matrix.rerun.json Updates rerun matrix to point at the v0.4 contract fixture.
eval/matrix.rerun-local.json Updates local rerun matrix to point at the v0.4 contract fixture.
eval/matrix.rerun-hosted.json Updates hosted rerun matrix to point at the v0.4 contract fixture.
eval/matrix.rerun-gptoss.json Updates gpt-oss rerun matrix to point at the v0.4 contract fixture.
eval/matrix.json Updates live eval matrix to point at the v0.4 contract fixture.
eval/matrix.fake.json Updates CI fake matrix to point at the v0.4 contract fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/core/compiler.ts
Comment on lines +180 to +187
const scope = r.within ? `Every ${r.component} inside ${r.within}` : `Every ${r.component}`;
const needs: string[] = [];
if (r.requiredText) needs.push("carry its label as its own `text` field (never nested in a child component)");
for (const p of r.requiredProps ?? []) {
needs.push(p.oneOf ? `set ${p.prop} to one of ${p.oneOf.map(String).join("/")}` : `set ${p.prop} directly`);
}
const existence = r.within ? `; every ${r.within} must contain a ${r.component}` : "";
return `${scope} must ${needs.join(", and ")}${existence}.`;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 94a2081: when neither requiredText nor requiredProps yields a phrasable constraint (schema-invalid, but reachable on contract drift since dspack-gen does not schema-validate contracts), the steering falls back to the generic 'Follow rule .' line — same text as the unknown-type default — instead of emitting a malformed sentence. Note the same latent pattern exists in the v0.3 component-choice branch (empty require+forbid renders '.'); left untouched here since v0.3 rendering predates this PR — say the word if you want a follow-up sweep.

Comment thread src/core/compiler.ts
Comment on lines 152 to +164
const parts: string[] = [];
if (r.forbiddenDescendants?.length)
parts.push(`Never place ${r.forbiddenDescendants.join(" or ")} inside a ${r.component}`);
for (const category of r.forbiddenCategories ?? []) {
// Steering names the category AND its resolved member ids, so the
// model sees concrete vocabulary (mirrors the finding message).
const members = contract
? [...categoryIndex(contract)].filter(([, cats]) => cats.includes(category)).map(([id]) => id)
: [];
parts.push(
`never place ${category}-category components${members.length ? ` (${members.join(", ")})` : ""} inside ${r.component}`,
);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 94a2081: the index is computed once per rule (and only when forbiddenCategories is present), then reused across category entries.

…ty-needs steering

- categoryIndex computed once per rule instead of per forbiddenCategories
  entry (Copilot comment 2).
- required-props steering with no phrasable constraints (schema-invalid but
  possible on contract drift) falls back to the generic 'Follow rule …' line
  instead of a malformed 'must ;' sentence (Copilot comment 1).

Goldens unchanged (steering text identical for valid contracts); 84 tests
green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ryandmonk ryandmonk merged commit 6c84531 into main Jul 3, 2026
2 checks passed
@ryandmonk ryandmonk deleted the feat/v0.4-evaluators branch July 3, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants