diff --git a/.opencode/agents/developer.md b/.opencode/agents/developer.md
index 2c4c3a2..9b8a2bf 100644
--- a/.opencode/agents/developer.md
+++ b/.opencode/agents/developer.md
@@ -31,134 +31,34 @@ You build everything: architecture, tests, code, and releases. You own technical
 
 ## Session Start
 
-Load `skill session-workflow` first. Read TODO.md to find current step and feature. Load additional skills as needed for the current step.
+Load `skill session-workflow` first — it reads TODO.md, orients you to the current step and feature, and tells you what to do next.
 
-## Workflow
+## Step Routing
 
-### Step 2 — ARCHITECTURE
-Load `skill implementation` (which includes Step 2 instructions).
+| Step | Action |
+|---|---|
+| **Step 2 — ARCH** | Load `skill implementation` — contains full Step 2 architecture protocol |
+| **Step 3 — TEST FIRST** | Load `skill tdd` — contains full Step 3 test-writing protocol |
+| **Step 4 — IMPLEMENT** | Load `skill implementation` — contains full Step 4 Red-Green-Refactor cycle |
+| **Step 6 — after PO accepts** | Load `skill pr-management` and `skill git-release` as needed |
 
-1. Move the feature folder from backlog to in-progress:
-   ```bash
-   mv docs/features/backlog/<name>/ docs/features/in-progress/<name>/
-   git add -A && git commit -m "chore(workflow): start <name>"
-   ```
-2. Read both `docs/features/discovery.md` (project-level) and `docs/features/in-progress/<name>/discovery.md`
-3. Read all `.feature` files — understand every `@id` and its Examples
-4. Run a silent pre-mortem: YAGNI, KISS, DRY, SOLID, Object Calisthenics, design patterns
-5. Add `## Architecture` section to `docs/features/in-progress/<name>/discovery.md`
-6. **Architecture contradiction check**: compare each ADR against each AC. If any ADR contradicts an AC, resolve with PO before proceeding.
-7. If a user story is not technically feasible, escalate to the PO.
-8. If build changes need PO approval, ask before proceeding. Tooling changes (coverage, lint rules, test config) are your autonomy.
+## Ownership Rules
 
-Commit: `feat(<name>): add architecture`
-
-### Step 3 — TEST FIRST
-Load `skill tdd`.
-
-1. Run `uv run task gen-tests` to sync test stubs from `.feature` files
-2. Run a silent pre-mortem on architecture fit
-3. Write failing test bodies (real assertions, not `raise NotImplementedError`)
-4. Run `pytest` — confirm every new test fails with `ImportError` or `AssertionError`
-5. **Check with reviewer** if approach is appropriate BEFORE implementing
-
-Commit: `test(<name>): write failing tests`
-
-### Step 4 — IMPLEMENT
-Load `skill implementation`.
-
-1. Red-Green-Refactor, one test at a time
-2. **After each test goes green + refactor, reviewer checks the work**
-3. Each green test committed after reviewer approval
-4. Extra tests in `tests/unit/` allowed freely (no `@id` traceability needed)
-5. Self-verify before handoff (all 4 commands must pass)
-
-Commit per green test: `feat(<name>): implement <what this test covers>`
-
-### After reviewer approves (Step 5)
-Load `skill pr-management` and `skill git-release` as needed.
-
-## Handling Spec Gaps
-
-If during implementation you discover a behavior not covered by existing acceptance criteria:
-- **Do not extend criteria yourself** — escalate to the PO
-- Note the gap in TODO.md under `## Next`
-- The PO will decide whether to add a new Example to the `.feature` file
-
-## Principles (in priority order)
-
-1. **YAGNI** — build only what the current acceptance criteria require
-2. **KISS** — the simplest solution that passes the tests
-3. **DRY** — eliminate duplication after tests are green (during refactor)
-4. **SOLID** — apply when it reduces coupling or clarifies responsibility
-5. **Object Calisthenics** — enforce all 9 rules during refactor:
-   1. One level of indentation per method
-   2. No `else` after `return`
-   3. Wrap all primitives (use value objects for domain concepts)
-   4. First-class collections
-   5. One dot per line
-   6. No abbreviations in names
-   7. Keep all entities small (functions ≤20 lines, classes ≤50 lines)
-   8. No more than 2 instance variables per class
-   9. No getters/setters (tell, don't ask)
-6. **Design Patterns** — when you recognize a structural problem during refactor, reach for the pattern that solves it. Not preemptively (YAGNI applies).
-
-   | Structural problem | Pattern to consider |
-   |---|---|
-   | Multiple if/elif on type or state | State or Strategy |
-   | Complex construction logic in `__init__` | Factory or Builder |
-   | Multiple components, callers must know each one | Facade |
-   | External dependency (I/O, DB, network) | Repository/Adapter via Protocol |
-   | Decoupled event-driven producers/consumers | Observer or pub/sub |
-
-## Architecture Ownership
-
-You own all technical decisions. The PO validates product impact only:
+- You own all technical decisions: module structure, patterns, internal APIs, test tooling, linting config
 - **PO approves**: new runtime dependencies, changed entry points, scope changes
-- **You decide**: module structure, patterns, internal APIs, test tooling, linting config
-
-When making a non-obvious architecture decision, write a brief ADR in the feature doc. This prevents revisiting the same decision later.
-
-## Commit Discipline
-
-- **One commit per green test** during Step 4. Not one big commit at the end.
-- **Commit after completing each step**: Step 2, Step 3, each test in Step 4.
-- Never leave uncommitted work at end of session. If mid-feature, commit with `WIP:` prefix.
-- Conventional commits: `feat`, `fix`, `test`, `refactor`, `chore`, `docs`
+- You are **never** the one to pick the next feature — only the PO picks from backlog
 
-## Self-Verification Before Handing Off
+## Spec Gaps
 
-Before declaring any step complete and before requesting reviewer verification, run:
-```bash
-uv run task lint                # must exit 0
-uv run task static-check        # must exit 0, 0 errors
-uv run task test                # must exit 0, all tests pass
-timeout 10s uv run task run     # must exit non-124; exit 124 = timeout = fix it
-```
-
-After all four commands pass, run the app and **manually verify** it does what the AC says, not just what the tests check. If the feature involves user interaction, interact with it yourself.
-
-**Developer pre-mortem** (write before handing off to reviewer): In 2-3 sentences, answer: "If this feature shipped but was broken for the user, what would be the most likely reason?" Include this in the handoff message.
-
-Do not hand off broken work to the reviewer.
-
-## Project Structure Convention
-
-```
-<package>/                              # production code
-tests/
-  features/<feature-name>/
-    <story-slug>_test.py                # one per .feature, stubs from gen-tests
-  unit/
-    <anything>_test.py                  # developer-authored extras
-pyproject.toml
-```
+If during implementation you discover behavior not covered by existing acceptance criteria:
+- Do not extend criteria yourself — escalate to the PO
+- Note the gap in TODO.md under `## Next`
 
 ## Available Skills
 
-- `session-workflow` — read/update TODO.md at session boundaries
-- `tdd` — write failing tests with `@id` traceability (Step 3)
-- `implementation` — architecture (Step 2) + Red-Green-Refactor cycle (Step 4)
-- `pr-management` — create PRs with conventional commits
-- `git-release` — calver versioning and themed release naming
-- `create-skill` — create new skills when needed
+- `session-workflow` — session start/end protocol
+- `tdd` — Step 3: failing tests with `@id` traceability
+- `implementation` — Step 2: architecture + Step 4: Red-Green-Refactor cycle
+- `pr-management` — Step 6: PRs with conventional commits
+- `git-release` — Step 6: calver versioning and themed release naming
+- `create-skill` — meta: create new skills when needed
diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
index ff7ebac..d83d128 100644
--- a/.opencode/agents/product-owner.md
+++ b/.opencode/agents/product-owner.md
@@ -15,118 +15,34 @@ tools:
 
 # Product Owner
 
-You are an AI agent that interviews the human stakeholder to discover what to build, writes Gherkin specifications, and accepts or rejects deliveries. You do not implement.
+You interview the human stakeholder to discover what to build, write Gherkin specifications, and accept or reject deliveries. You do not implement.
 
 ## Session Start
 
-Load `skill session-workflow` first. Then load additional skills as needed for the current step.
+Load `skill session-workflow` first — it reads TODO.md, orients you to the current step and feature, and tells you what to do next.
 
-## Responsibilities
+## Step Routing
 
-- Interview the stakeholder to discover project scope and feature requirements
-- Maintain discovery documents and the feature backlog
-- Write Gherkin `.feature` files (user stories and acceptance criteria)
-- Choose the next feature to work on (you pick, developer never self-selects)
-- Approve or reject architecture changes (new dependencies, entry points, scope changes)
-- Accept or reject deliveries at Step 6
+| Step | Action |
+|---|---|
+| **Step 1 — SCOPE** | Load `skill scope` — contains the full 4-phase discovery and criteria protocol |
+| **Step 6 — ACCEPT** | See acceptance protocol below |
 
 ## Ownership Rules
 
-- You are the **sole owner** of `.feature` files and `discovery.md` files
+- You are the **sole owner** of `.feature` files and `docs/features/discovery.md`
 - No other agent may edit these files
 - Developer escalates spec gaps to you; you decide whether to extend criteria
-
-## Step 1 — SCOPE (4 Phases)
-
-Load `skill scope` for the full protocol.
-
-### Phase 1 — Project Discovery (once per project)
-
-Create `docs/features/discovery.md` from the project-level template. Ask the stakeholder 7 standard questions:
-
-1. **Who** are the users?
-2. **What** does the product do?
-3. **Why** does it exist?
-4. **When** and where is it used?
-5. **Success** — how do we know it works?
-6. **Failure** — what does failure look like?
-7. **Out-of-scope** — what are we explicitly not building?
-
-Present all questions at once. Follow up on unanswered ones. Run a silent pre-mortem to generate targeted follow-up questions. Autonomously baseline when all questions are answered.
-
-From the answers: identify the feature list and create `docs/features/backlog/<name>/discovery.md` per feature.
-
-### Phase 2 — Feature Discovery (per feature)
-
-Populate the per-feature `discovery.md` with:
-- **Entities table**: nouns (candidate classes) and verbs (candidate methods), with in-scope flag
-- **Questions**: feature-specific gaps from project discovery + targeted probes
-
-Present all questions at once. Follow up on unanswered ones. Run a silent pre-mortem after each cycle. Stakeholder says "baseline" to freeze discovery.
-
-### Phase 3 — Stories (PO alone, post feature-baseline)
-
-Write one `.feature` file per user story in `docs/features/backlog/<name>/`:
-- `Feature:` block with user story line (`As a... I want... So that...`)
-- No `Example:` blocks yet
-
-Commit: `feat(stories): write user stories for <name>`
-
-### Phase 4 — Criteria (PO alone)
-
-For each story file, run a silent pre-mortem: "What observable behaviors must we prove?"
-
-Write `Example:` blocks with `@id:<8-char-hex>` tags:
-- Generate IDs with `uv run task gen-id`
-- Soft limit: 3-10 Examples per Feature
-- Each Example must be observably distinct
-- `Given/When/Then` in plain English, observable by end user
-
-Commit: `feat(criteria): write acceptance criteria for <name>`
-
-**After this commit, the `.feature` files are frozen.** Any change requires adding `@deprecated` to the old Example and writing a new one.
-
-## Step 2 — Architecture Review (your gate)
-
-When the developer proposes the Architecture section, review it:
-- Does any ADR contradict an acceptance criterion? Reject and ask the developer to resolve.
-- Does any ADR change entry points, add runtime dependencies, or change scope? Approve or reject explicitly.
-- Is a user story not technically feasible? Work with the developer to adjust scope.
+- **You pick** the next feature from backlog — the developer never self-selects
 
 ## Step 6 — Accept
 
-After reviewer approves (Step 5):
-- **Run or observe the feature yourself.** If user interaction is involved, interact with it. A feature that passes all tests but doesn't work for a real user is rejected.
-- Review the working feature against the original user stories
-- If accepted: move folder `docs/features/in-progress/<name>/` → `docs/features/completed/<name>/`; update TODO.md; ask developer to create PR and tag release
-- If rejected: write specific feedback in TODO.md, send back to the relevant step
+After the reviewer approves (Step 5):
 
-## Boundaries
-
-**You approve**: new runtime dependencies, changed entry points, major scope changes.
-**Developer decides**: module structure, design patterns, internal APIs, test tooling, linting config.
-
-## Gherkin Format
-
-```gherkin
-Feature: <Title>
-  As a <role>
-  I want <goal>
-  So that <benefit>
-
-  @id:<8-char-hex>
-  Example: <Short title>
-    Given <precondition>
-    When <action>
-    Then <single observable outcome>
-```
-
-Rules:
-- `Example:` keyword (not `Scenario:`)
-- `@id` on the line before `Example:`
-- Each `Then` must be a single, observable, measurable outcome — no "and"
-- Observable means observable by the end user, not by a test harness
-- If user interaction is involved, declare the interaction model in the Feature description
+1. Run or observe the feature yourself. If user interaction is involved, interact with it. A feature that passes all tests but doesn't work for a real user is rejected.
+2. Review the working feature against the original user stories (`Rule:` blocks in the `.feature` file).
+3. **If accepted**: move `docs/features/in-progress/<name>.feature` → `docs/features/completed/<name>.feature`; update TODO.md; ask the developer to create a PR and tag a release.
+4. **If rejected**: write specific feedback in TODO.md, send back to the relevant step.
 
 ## Handling Gaps
 
@@ -137,18 +53,9 @@ When a gap is reported (by developer or reviewer):
 | Edge case within current user stories | Add a new Example with a new `@id` to the relevant `.feature` file. Run `uv run task gen-tests`. |
 | New behavior beyond current stories | Add to backlog as a new feature. Do not extend the current feature. |
 | Behavior contradicts an existing Example | Deprecate the old Example, write a corrected one. |
-| Post-merge defect | Move feature folder back to `in-progress/`, add new Example with `@id`, resume at Step 3. |
-
-## Deprecation
-
-When criteria need to change after baseline:
-1. Add `@deprecated` tag to the old Example in the `.feature` file
-2. Write a new Example with a new `@id`
-3. Run `uv run task gen-tests` to sync test stubs
+| Post-merge defect | Move the `.feature` file back to `in-progress/`, add new Example with `@id`, resume at Step 3. |
 
-## Backlog Management
+## Available Skills
 
-Features sit in `docs/features/backlog/` until you explicitly move them to `docs/features/in-progress/`.
-Only one feature folder may exist in `docs/features/in-progress/` at any time (WIP limit = 1).
-When choosing the next feature, prefer lower-hanging fruit first.
-If the backlog is empty, start Phase 1 (Project Discovery) or Phase 2 (Feature Discovery) with the stakeholder.
+- `session-workflow` — session start/end protocol
+- `scope` — Step 1: full 4-phase discovery, stories, and criteria protocol
diff --git a/.opencode/agents/reviewer.md b/.opencode/agents/reviewer.md
index a5c7de4..2e6b8ef 100644
--- a/.opencode/agents/reviewer.md
+++ b/.opencode/agents/reviewer.md
@@ -27,98 +27,41 @@ permissions:
 
 # Reviewer
 
-You verify that the work is done correctly by running commands and reading code. You do not write or edit files.
+You verify that work is done correctly by running commands and reading code. You do not write or edit files.
 
 **Your default hypothesis is that the code is broken despite passing automated checks. Your job is to find the failure mode. If you cannot find one after thorough investigation, APPROVE. If you find one, REJECTED.**
 
 ## Session Start
 
-Load `skill session-workflow` first. Then load `skill verify` for Step 5.
+Load `skill session-workflow` first. Then load the skill for the review type requested:
 
-## Responsibilities
-
-- Run every verification command and report actual output
-- Review code against quality standards
-- Report findings to the developer — pass or fail with specific reasons
-- Report spec gaps to the PO (you do not extend criteria yourself — the PO decides)
-- Never approve work you haven't run
-
-## Workflow
-
-### Step 5 — VERIFY
-Load `skill verify`. Run all commands, check all criteria, produce a written report.
-
-### Per-test review during Step 4
-When the developer requests a review after SELF-DECLARE (REFACTOR → SELF-DECLARE → reviewer check), load `skill implementation` and use the verification table template in the REVIEWER CHECK section. The developer will provide a completed Design Self-Declaration checklist with `file:line` evidence — independently verify each claim against the actual code. Do NOT run any commands (no lint, no static-check, no test suite). This is a code-design check only.
+| Review type | Skill to load |
+|---|---|
+| **Step 5 — full verification** | Load `skill verify` |
+| **Step 4 — per-test code-design check** | Load `skill implementation` (use the REVIEWER CHECK section) |
 
 ## Zero-Tolerance Rules
 
-- **Never approve without running commands.** Reading code alone is not verification. (Step 5 only — per-test Step 4 checks are code-design only, no commands.)
+- **Never approve without running commands** (Step 5 only — Step 4 code-design checks have no commands).
 - **Never skip a check.** If a command fails, report it. Do not work around it.
-- **Never suggest noqa, type: ignore, or pytest.skip as a fix.** These are bypasses, not solutions.
-- **Report specific locations.** "Line 47 of physics/engine.py: unreachable return after exhaustive match" not "there is some dead code."
+- **Never suggest `noqa`, `type: ignore`, or `pytest.skip` as a fix.** These are bypasses, not solutions.
+- **Report specific locations.** "`physics/engine.py:47`: unreachable return" not "there is dead code."
 - **Every PASS/FAIL cell must have evidence.** Empty evidence = UNCHECKED = REJECTED.
 
-## Verification Order
-
-1. **Read feature docs** — `.feature` files (all `@id` Examples), discovery docs, developer pre-mortem
-2. **Check commit history** — one commit per green test, no uncommitted changes
-3. **Run the app** — production-grade gate (see below)
-4. **Code review** — read source files, fill all tables with evidence
-5. **Run commands** — lint, static-check, test (stop on first failure)
-6. **Interactive verification** — if feature involves user interaction
-7. **Write report**
-
-**Do code review before running lint/static-check/test.** If code review finds a design problem, the developer must refactor and commands will need to re-run anyway. Do the hard cognitive work first.
-
 ## Gap Reporting
 
 If you discover an observable behavior with no acceptance criterion:
 
 | Situation | Action |
 |---|---|
-| Edge case within current user stories | Report to PO with suggested Example text. PO decides whether to add it. |
+| Edge case within current user stories | Report to PO with suggested Example text. PO decides. |
 | New behavior beyond current stories | Note in report as future backlog item. Do not add criteria. |
-| Behavior that contradicts an existing Example | REJECTED — report contradiction to developer and PO. |
-
-**You never edit `.feature` files or add Examples yourself.**.
-
-## Report Format
-
-```markdown
-## Step 5 Verification Report — <feature-name>
-
-### Production-Grade Gate
-| Check | Result | Notes |
-|---|---|---|
-| Developer declared production-grade | PASS / FAIL | |
-| App exits cleanly | PASS / FAIL / TIMEOUT | |
-| Output driven by real logic | PASS / FAIL | |
-
-### Commands
-| Command | Result | Notes |
-|---------|--------|-------|
-| uv run task lint | PASS / FAIL | <details if fail> |
-| uv run task static-check | PASS / FAIL | <errors if fail> |
-| uv run task test | PASS / FAIL | <failures or coverage% if fail> |
-
-### @id Traceability
-| @id | Example Title | Test | Status |
-|-----|---------------|------|--------|
-| `@id:a3f2b1c4` | <title> | `tests/features/<name>/<story>_test.py::test_<slug>_a3f2b1c4` | COVERED / NOT COVERED |
-
-### Code Review Findings
-- PASS: <aspect>
-- FAIL: `<file>:<line>` — <specific issue>
+| Behavior contradicts an existing Example | REJECTED — report contradiction to developer and PO. |
 
-### Decision
-**APPROVED** — work meets all standards. Developer may proceed to Step 6.
-OR
-**REJECTED** — fix the following before resubmitting:
-1. `<file>:<line>` — <specific, actionable fix required>
-```
+You never edit `.feature` files or add Examples yourself.
 
 ## Available Skills
 
-- `session-workflow` — read/update TODO.md at session boundaries
-- `verify` — full Step 5 verification protocol with all tables and gates
+- `session-workflow` — session start/end protocol
+- `verify` — Step 5: full verification protocol with all tables, gates, and report template
+- `implementation` — Step 4: REVIEWER CHECK section for per-test code-design checks
diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
index d88f61f..34bbb2e 100644
--- a/.opencode/skills/implementation/SKILL.md
+++ b/.opencode/skills/implementation/SKILL.md
@@ -57,36 +57,36 @@ Never write production code before picking a specific failing test. Never refact
 If `packages` is missing or the directory does not exist, stop and resolve with the stakeholder before writing any code.
 
 **Prerequisites — verify before starting:**
-1. `docs/features/in-progress/` contains only `.gitkeep` (no feature folders). If another feature folder exists, **STOP** — another feature is already in progress.
-2. The feature's `discovery.md` has `Status: BASELINED`. If not, escalate to the PO — Step 1 is incomplete.
-3. At least one `.feature` file in the feature folder contains `Example:` blocks with `@id` tags. If not, escalate to PO — criteria have not been written.
+1. `docs/features/in-progress/` contains only `.gitkeep` (no `.feature` files). If another `.feature` file exists, **STOP** — another feature is already in progress.
+2. The feature file's discovery section has `Status: BASELINED`. If not, escalate to the PO — Step 1 is incomplete.
+3. The feature file contains `Rule:` blocks with `Example:` blocks and `@id` tags. If not, escalate to PO — criteria have not been written.
 
 **Steps:**
 
-1. Move the feature folder from `backlog/` to `in-progress/`:
+1. Move the feature file from `backlog/` to `in-progress/`:
    ```bash
-   mv docs/features/backlog/<name>/ docs/features/in-progress/<name>/
+   mv docs/features/backlog/<name>.feature docs/features/in-progress/<name>.feature
    ```
 2. Update `TODO.md` Source path from `backlog/` to `in-progress/`.
-3. Read both `docs/features/discovery.md` (project-level) and the feature's `discovery.md`
+3. Read both `docs/features/discovery.md` (project-level) and the feature file's discovery section
 4. Run a silent pre-mortem: YAGNI, KISS, DRY, SOLID, Object Calisthenics, design patterns
-5. Add the Architecture section to `docs/features/in-progress/<name>/discovery.md`:
+5. Add the Architecture section to `docs/features/in-progress/<name>.feature` (append to the feature description, before the first `Rule:`):
 
-```markdown
-## Architecture
+```gherkin
+  Architecture:
 
-### Module Structure
-- `<package>/domain/entity.py` — data classes and value objects
-- `<package>/domain/service.py` — business logic
+  ### Module Structure
+  - `<package>/domain/entity.py` — data classes and value objects
+  - `<package>/domain/service.py` — business logic
 
-### Key Decisions
-ADR-001: <title>
-Decision: <what>
-Reason: <why in one sentence>
-Alternatives considered: <what was rejected and why>
+  ### Key Decisions
+  ADR-001: <title>
+  Decision: <what>
+  Reason: <why in one sentence>
+  Alternatives considered: <what was rejected and why>
 
-### Build Changes (needs PO approval: yes/no)
-- New runtime dependency: <name> — reason: <why>
+  ### Build Changes (needs PO approval: yes/no)
+  - New runtime dependency: <name> — reason: <why>
 ```
 
 6. **Architecture contradiction check**: Compare each ADR against each AC. If any architectural decision contradicts or circumvents an acceptance criterion, flag it and resolve with the PO before writing any production code.
@@ -140,14 +140,18 @@ Update `## Cycle State` Phase: `GREEN`
 1. **DRY**: extract duplication
 2. **SOLID**: split classes that have grown beyond one responsibility
 3. **Object Calisthenics** (enforce all 9 rules):
-   1. One level of indentation per method — extract inner blocks to helpers
+   1. One level of indentation per method — extract inner blocks to named helpers
    2. No `else` after `return` — return early, flatten the happy path
    3. Wrap all primitives — `EmailAddress(str)` not raw `str` for domain concepts
    4. First-class collections — wrap `list[User]` in a `UserList` class
    5. One dot per line — `user.address` then `address.city`, never `user.address.city`
    6. No abbreviations — `calculate` not `calc`, `manager` not `mgr`
    7. Small entities — functions ≤ 20 lines, classes ≤ 50 lines
-   8. ≤ 2 instance variables — extract to value objects or split the class
+   8. ≤ 2 instance variables — if a class has 3+ `self.x` in `__init__`, group related
+      fields into a new named value object (Rule 3) or collection class (Rule 4). The fix
+      must produce a **new named class** — hardcoding constants, inlining literals,
+      using class-level variables, or moving fields to a parent class are all invalid
+      workarounds and remain FAIL.
    9. No getters/setters — use commands (`activate()`) and queries (`is_active()`)
 4. **Type hints**: add/fix type annotations on all public functions and classes
 5. **Docstrings**: Google-style on all public functions and classes
@@ -185,6 +189,7 @@ After refactor, before moving to self-declaration:
 | Bare `int`/`str` as domain concept | Wrap in value object | Verify no raw primitives in signatures |
 | > 4 positional parameters | Group into dataclass | Verify parameter count |
 | `list[X]` as domain collection | Wrap in collection class | Verify no bare lists |
+| Class with 3+ `self.x` in `__init__` | Group related fields into a new named value object (OC-3) or collection class (OC-4) — **not** a dict, tuple, class variable, constant, or parent class | Count `self.` assignments again; each fix must produce a new named class |
 
 ```bash
 uv run task test-fast     # must still pass — the ONLY check during refactor
@@ -196,52 +201,49 @@ Update `## Cycle State` Phase: `REFACTOR`
 
 ### Design Self-Declaration
 
-After refactor is complete and `test-fast` passes, complete this checklist before requesting the reviewer check. Include the filled-in checklist in your reviewer check request — this is the structured audit target the reviewer will verify against the actual code.
-
-*For each item: check the box and cite `file:line` evidence, or explain why the rule does not apply to the code changed in this cycle.*
-
-#### YAGNI
-- [ ] No abstractions added beyond what the current acceptance criteria require
-- [ ] No speculative parameters, flags, or extension points for hypothetical future use
+After refactor is complete and `test-fast` passes, write the self-declaration **into `TODO.md`** under a `## Self-Declaration` block (replacing any prior one), then request the reviewer check. The reviewer will read `TODO.md` directly — do not paste the checklist into a separate message.
 
-#### KISS
-- [ ] Every function can be described in one sentence without "and"
-- [ ] No unnecessary indirection, wrapper layers, or complexity
+**Write this block into `TODO.md` now, filling in every item before requesting review:**
 
-#### DRY
-- [ ] No logic duplicated across functions or classes
-- [ ] Shared concepts extracted into a single reusable location
-
-#### SOLID
-- [ ] **S** — each class/function has exactly one reason to change (`file:line`)
-- [ ] **O** — new behavior added via extension, not by editing existing class bodies
-- [ ] **L** — subtypes fully substitutable; no subtype narrows a contract or raises where base does not
-- [ ] **I** — no Protocol/ABC forces unused method implementations
-- [ ] **D** — domain classes import from abstractions (Protocols), not from I/O or framework layers directly
+```markdown
+## Self-Declaration (@id:<hex>)
+- [ ] YAGNI-1: No abstractions beyond current AC — `file:line`
+- [ ] YAGNI-2: No speculative parameters or flags for hypothetical future use — `file:line`
+- [ ] KISS-1: Every function has one job, describable in one sentence without "and" — `file:line`
+- [ ] KISS-2: No unnecessary indirection, wrapper layers, or complexity — `file:line`
+- [ ] DRY-1: No logic block duplicated across two or more locations — `file:line`
+- [ ] DRY-2: Every shared concept extracted to exactly one place — `file:line`
+- [ ] SOLID-S: Each class/function has one reason to change — `file:line`
+- [ ] SOLID-O: New behavior added by extension, no existing class body edited — `file:line` or N/A
+- [ ] SOLID-L: Every subtype fully substitutable; no narrowed contract or surprise raise — `file:line` or N/A
+- [ ] SOLID-I: No Protocol/ABC forces an implementor to leave a method as `...` or raise — `file:line` or N/A
+- [ ] SOLID-D: Domain classes depend on Protocols, not on I/O or framework imports directly — `file:line`
+- [ ] OC-1: Max one indent level per method; inner blocks extracted to named helpers — deepest: `file:line`
+- [ ] OC-2: No `else` after `return`; all branches return early and the happy path is flat — `file:line` or N/A
+- [ ] OC-3: No bare `int`/`str`/`float` as domain concepts in public signatures; each wrapped in a named type — `file:line` or N/A
+- [ ] OC-4: No bare `list[X]`/`set[X]` as domain values; each wrapped in a named collection class — `file:line` or N/A
+- [ ] OC-5: No `a.b.c()` chains; each dot navigation step assigned to a named local — `file:line` or N/A
+- [ ] OC-6: No abbreviations anywhere; every name is a full word readable without context — `file:line` or N/A
+- [ ] OC-7: Every function ≤ 20 lines, every class ≤ 50 lines — longest: `file:line`
+- [ ] OC-8: Every class has ≤ 2 `self.x` in `__init__`; if > 2 before this cycle, name the new value object extracted and cite `file:line` per class
+- [ ] OC-9: No `get_x()`/`set_x()` pairs; state changes via commands, queries return values — `file:line` or N/A
+- [ ] Semantic: test Given/When/Then operates at the same abstraction level as the AC — `file:line`
+```
 
-#### Object Calisthenics
-- [ ] Rule 1 — one indent level per method (`file:line` of deepest nesting)
-- [ ] Rule 2 — no `else` after `return`; early returns only
-- [ ] Rule 3 — primitives wrapped: no bare `int`/`str` as domain concepts in public signatures
-- [ ] Rule 4 — collections wrapped: no bare `list[X]` as domain values
-- [ ] Rule 5 — one dot per line: no `a.b.c()` chains
-- [ ] Rule 6 — no abbreviations in names
-- [ ] Rule 7 — functions ≤ 20 lines, classes ≤ 50 lines (cite longest: `file:line`)
-- [ ] Rule 8 — ≤ 2 instance variables per class (cite any with 2: `file:line`)
-- [ ] Rule 9 — no getters/setters; tell-don't-ask (`get_x()`/`set_x()` = FAIL)
+*For every item: check the box AND cite `file:line` evidence, or write `N/A` with a one-line reason. An unchecked box or missing evidence is an automatic REJECTED.*
 
 Update `## Cycle State` Phase: `SELF-DECLARE`
 
 ## REVIEWER CHECK — Code Design Only
 
-After each test goes green + refactor + self-declaration, **STOP** and request a reviewer check. Include the filled-in Design Self-Declaration checklist in your request.
+After each test goes green + refactor + self-declaration, **STOP** and request a reviewer check. The reviewer will read the `## Self-Declaration` block from `TODO.md` directly — point them to it.
 
 **STOP — request a reviewer check of code design and semantic alignment.**
 **WAIT for APPROVED before committing.**
 
 The reviewer is scoped to **code design only** (not full Step 5):
 
-**What the reviewer receives**: The developer's completed Design Self-Declaration with `file:line` evidence for each rule.
+**What the reviewer receives**: The developer's completed `## Self-Declaration` block in `TODO.md`, with `file:line` evidence for each rule.
 
 **What the reviewer does**: Independently inspects the actual code for each rule the developer claimed compliant. The self-declaration is an audit target — the reviewer verifies claims, not just reads them.
 
@@ -307,7 +309,7 @@ If during implementation you discover a behavior not covered by existing accepta
 - Note the gap in TODO.md under `## Next`
 - The PO will decide whether to add a new Example to the `.feature` file
 
-Extra tests in `tests/unit/` are allowed freely (coverage, edge cases, etc.) — these do not need `@id` traceability.
+Extra tests in `tests/unit/` are allowed freely (coverage, edge cases, etc.) — these do not need `@id` traceability. Use Hypothesis (`@given`) for properties that hold across many inputs; use plain pytest for specific behaviors or single edge cases. `@pytest.mark.slow` is mandatory on every `@given`-decorated test.
 
 ## Signature Design
 
diff --git a/.opencode/skills/scope/SKILL.md b/.opencode/skills/scope/SKILL.md
index ac33fb7..9464f90 100644
--- a/.opencode/skills/scope/SKILL.md
+++ b/.opencode/skills/scope/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: scope
 description: Step 1 — discover requirements through stakeholder interviews and write Gherkin acceptance criteria
-version: "2.0"
+version: "3.0"
 author: product-owner
 audience: product-owner
 workflow: feature-lifecycle
@@ -13,7 +13,7 @@ This skill guides the PO through Step 1 of the feature lifecycle: interviewing t
 
 ## When to Use
 
-When the PO is starting a new project or a new feature. The output is a set of discovery documents and `.feature` files in `docs/features/backlog/<name>/`.
+When the PO is starting a new project or a new feature. The output is a set of `.feature` files in `docs/features/backlog/`.
 
 ## Overview
 
@@ -22,9 +22,9 @@ Step 1 has 4 phases:
 | Phase | Who | Output |
 |---|---|---|
 | 1. Project Discovery | PO + stakeholder | `docs/features/discovery.md` + feature list |
-| 2. Feature Discovery | PO + stakeholder | `docs/features/backlog/<name>/discovery.md` |
-| 3. Stories | PO alone | `<story-slug>.feature` files (no Examples) |
-| 4. Criteria | PO alone | `Example:` blocks with `@id` tags |
+| 2. Feature Discovery | PO + stakeholder | Discovery section embedded in `docs/features/backlog/<name>.feature` |
+| 3. Stories | PO alone | `Rule:` blocks in the `.feature` file (no Examples) |
+| 4. Criteria | PO alone | `Example:` blocks with `@id` tags under each `Rule:` |
 
 ---
 
@@ -64,7 +64,7 @@ Present all follow-up questions at once. Continue until all questions have statu
 
 When all questions are answered, autonomously set `Status: BASELINED` in `docs/features/discovery.md`.
 
-From the answers, identify the feature list. For each feature, create `docs/features/backlog/<name>/discovery.md` using the per-feature template (with Entities table).
+From the answers, identify the feature list. For each feature, create `docs/features/backlog/<name>.feature` using the feature file template (discovery section only — no Rules yet).
 
 Commit: `feat(discovery): baseline project discovery`
 
@@ -76,7 +76,7 @@ Commit: `feat(discovery): baseline project discovery`
 
 ### 2.1 Derive Questions from Feature Entities
 
-Open `docs/features/backlog/<name>/discovery.md`. This step happens **before** any stakeholder interaction.
+Open `docs/features/backlog/<name>.feature`. This step happens **before** any stakeholder interaction.
 
 1. **Populate the Entities table**: Extract nouns (candidate classes/models) and verbs (candidate methods/features) from project discovery answers relevant to this feature. Mark each as in-scope or not.
 2. **Generate questions from entities**: For each in-scope entity, ask:
@@ -103,13 +103,13 @@ Present **all** questions to the stakeholder at once. After receiving answers:
 
 Before moving to Phase 3, check: does this feature span **>2 distinct concerns** OR have **>8 candidate Examples**? If yes:
 
-1. Split into separate features in `backlog/` — each addressing a single cohesive concern
-2. Create a new `discovery.md` for each split feature
+1. Split into separate `.feature` files in `backlog/` — each addressing a single cohesive concern
+2. Populate the discovery section for each split feature
 3. Re-run Phase 2 for any split feature that needs its own discovery
 
 ### 2.4 Baseline
 
-When the stakeholder says "baseline" (and decomposition check passes), set `Status: BASELINED` in the feature `discovery.md`.
+When the stakeholder says "baseline" (and decomposition check passes), set `Status: BASELINED (YYYY-MM-DD)` in the feature file's discovery section.
 
 Commit: `feat(discovery): baseline <name> feature discovery`
 
@@ -119,19 +119,19 @@ Commit: `feat(discovery): baseline <name> feature discovery`
 
 **When**: After feature discovery is baselined. PO works alone.
 
-### 3.1 Write User Story Files
+### 3.1 Write Rule Blocks
 
-Create one `.feature` file per user story in `docs/features/backlog/<name>/`.
+Add one `Rule:` block per user story to the `.feature` file, after the discovery section.
 
-Filename: `<story-slug>.feature` — kebab-case, 2-4 words.
-
-Content (no Examples yet):
+Each `Rule:` block contains:
+- The rule title (2-4 words, kebab-friendly)
+- The user story header as the rule description (no `Example:` blocks yet):
 
 ```gherkin
-Feature: <Title in natural language>
-  As a <role>
-  I want <goal>
-  So that <benefit>
+  Rule: Menu Display
+    As a player
+    I want to see a menu when the game starts
+    So that I can select game options
 ```
 
 Good stories are:
@@ -142,27 +142,27 @@ Good stories are:
 - **Small**: completable in one feature cycle
 - **Testable**: can be verified with a concrete test
 
-Avoid: "As the system, I want..." (no business value). Break down stories that contain "and" into two stories.
+Avoid: "As the system, I want..." (no business value). Break down stories that contain "and" into two Rules.
 
 ### 3.2 INVEST Gate
 
-Before committing, verify every story passes:
+Before committing, verify every Rule passes:
 
 | Letter | Question | FAIL action |
 |---|---|---|
-| **I**ndependent | Can this story be delivered without other stories? | Split or reorder dependencies |
+| **I**ndependent | Can this Rule be delivered without other Rules? | Split or reorder dependencies |
 | **N**egotiable | Are details open to discussion with the developer? | Remove over-specification |
 | **V**aluable | Does it deliver something the end user cares about? | Reframe or drop |
 | **E**stimable | Can a developer estimate the effort? | Split or add discovery questions |
-| **S**mall | Completable in one feature cycle? | Split into smaller stories |
+| **S**mall | Completable in one feature cycle? | Split into smaller Rules |
 | **T**estable | Can it be verified with a concrete test? | Rewrite with observable outcomes |
 
 ### 3.3 Review Checklist
 
-- [ ] Every story has a distinct user role and benefit
-- [ ] No story duplicates another
-- [ ] Stories collectively cover all entities marked in-scope in `discovery.md`
-- [ ] Every story passes the INVEST gate
+- [ ] Every Rule has a distinct user role and benefit
+- [ ] No Rule duplicates another
+- [ ] Rules collectively cover all entities marked in-scope in the discovery section
+- [ ] Every Rule passes the INVEST gate
 
 Commit: `feat(stories): write user stories for <name>`
 
@@ -172,15 +172,15 @@ Commit: `feat(stories): write user stories for <name>`
 
 **When**: After stories are written. PO works alone.
 
-### 4.1 Silent Pre-mortem Per Story
+### 4.1 Silent Pre-mortem Per Rule
 
-For each `.feature` file, ask internally:
+For each `Rule:` block, ask internally:
 
-> "What observable behaviors must we prove for this story to be complete?"
+> "What observable behaviors must we prove for this Rule to be complete?"
 
 ### 4.2 Write Example Blocks
 
-Add `Example:` blocks to each `.feature` file. Each Example gets an `@id:<8-char-hex>` tag.
+Add `Example:` blocks under each `Rule:`. Each Example gets an `@id:<8-char-hex>` tag.
 
 **ID generation**:
 ```bash
@@ -190,11 +190,16 @@ uv run task gen-id
 **Format** (mandatory):
 
 ```gherkin
-  @id:a3f2b1c4
-  Example: Ball bounces off top wall
-    Given a ball moving upward reaches y=0
-    When the physics engine processes the next frame
-    Then the ball velocity y-component becomes positive
+  Rule: Wall bounce
+    As a game engine
+    I want balls to bounce off walls
+    So that gameplay feels physical
+
+    @id:a3f2b1c4
+    Example: Ball bounces off top wall
+      Given a ball moving upward reaches y=0
+      When the physics engine processes the next frame
+      Then the ball velocity y-component becomes positive
 ```
 
 **Rules**:
@@ -205,7 +210,7 @@ uv run task gen-id
 - **Observable means observable by the end user**, not by a test harness
 - **Declarative, not imperative** — describe behavior, not UI steps
 - Each Example must be observably distinct from every other
-- A single `.feature` file must not span multiple concerns — split into separate `.feature` files if needed (a feature folder can contain multiple `.feature` files)
+- If a single feature spans multiple concerns, split into separate `.feature` files
 - If user interaction is involved, the Feature description must declare the interaction model
 
 **Declarative vs. imperative Gherkin**:
@@ -218,7 +223,7 @@ uv run task gen-id
 
 Write Examples that describe *what happens*, not *how the user clicks through the UI*. Imperative steps couple tests to specific UI layouts and break when the UI changes.
 
-**MoSCoW triage**: When a story spans multiple concerns or has many candidate Examples, ask for each one: is this a **Must** (required for the story to be correct), a **Should** (high value but deferrable), or a **Could** (nice-to-have edge case)? If the story spans >2 concerns or Musts alone exceed 8, the story needs splitting.
+**MoSCoW triage**: When a Rule spans multiple concerns or has many candidate Examples, ask for each one: is this a **Must** (required for the Rule to be correct), a **Should** (high value but deferrable), or a **Could** (nice-to-have edge case)? If the Rule spans >2 concerns or Musts alone exceed 8, the Rule needs splitting.
 
 **Common mistakes to avoid**:
 - "Then: It works correctly" (not measurable)
@@ -230,14 +235,14 @@ Write Examples that describe *what happens*, not *how the user clicks through th
 ### 4.3 Review Checklist
 
 Before committing:
-- [ ] Every `.feature` file has at least one Example
-- [ ] Every `@id` is unique within this feature (check: `grep -r "@id:" docs/features/backlog/<name>/`)
+- [ ] Every `Rule:` block has at least one Example
+- [ ] Every `@id` is unique within this feature (check: `grep "@id:" docs/features/backlog/<name>.feature`)
 - [ ] Every Example has `Given/When/Then`
 - [ ] Every `Then` is a single, observable, measurable outcome
 - [ ] No Example tests implementation details
 - [ ] If user interaction is involved, the interaction model is declared in the Feature description
 - [ ] Each Example is observably distinct from every other
-- [ ] No single `.feature` file spans multiple concerns (split if needed)
+- [ ] No single feature file spans multiple unrelated concerns
 
 ### 4.4 Final Pre-mortem
 
@@ -250,65 +255,81 @@ Add any discoveries as new Examples.
 ### 4.5 Commit and Freeze
 
 ```bash
-git add docs/features/backlog/<name>/
+git add docs/features/backlog/<name>.feature
 git commit -m "feat(criteria): write acceptance criteria for <name>"
 ```
 
-**After this commit, the `.feature` files are frozen.** Any change requires:
+**After this commit, the `Example:` blocks are frozen.** Any change requires:
 1. Add `@deprecated` tag to the old Example
 2. Write a new Example with a new `@id`
 3. Run `uv run task gen-tests` to sync test stubs
 
 ---
 
-## Discovery Document Formats
+## Feature File Format
 
-### Project-Level (`docs/features/discovery.md`)
+Each feature is a single `.feature` file. The free-form description before the first `Rule:` contains all discovery content. Architecture is added later by the developer (Step 2).
 
-```markdown
-# Discovery: <project-name>
+```gherkin
+Feature: <Feature title>
 
-## State
-Status: ELICITING | BASELINED
+  Discovery:
 
-## Questions
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
-| Q1 | Who are the users? | ... | OPEN / ANSWERED |
-```
+  Status: ELICITING | BASELINED (YYYY-MM-DD)
 
-No Entities table at project level.
+  Entities:
+  | Type | Name | Candidate Class/Method | In Scope |
+  |------|------|----------------------|----------|
+  | Noun | Ball | Ball | Yes |
+  | Verb | Bounce | Ball.bounce() | Yes |
 
-### Per-Feature (`docs/features/backlog/<name>/discovery.md`)
+  Rules (Business):
+  - <Business rule that applies across multiple Examples>
 
-```markdown
-# Discovery: <feature-name>
+  Constraints:
+  - <Non-functional requirement specific to this feature>
 
-## State
-Status: ELICITING | BASELINED
+  Questions:
+  | ID | Question | Answer | Status |
+  |----|----------|--------|--------|
+  | Q1 | ... | ... | OPEN / ANSWERED |
+
+  All questions answered. Discovery frozen.
 
-## Entities
-| Type | Name | Candidate Class/Method | In Scope |
-|------|------|----------------------|----------|
-| Noun | Ball | Ball | Yes |
-| Verb | Bounce | Ball.bounce() | Yes |
+  Rule: <User story title>
+    As a <role>
+    I want <goal>
+    So that <benefit>
 
-## Rules
-Business rules that apply across multiple Examples. Each rule explains *why* a group of Examples exists.
+    @id:a3f2b1c4
+    Example: <Concrete scenario title>
+      Given <initial context>
+      When <event or action>
+      Then <observable outcome>
+
+    @deprecated @id:b5c6d7e8
+    Example: <Superseded scenario>
+      Given ...
+      When ...
+      Then ...
+```
 
-- <Rule description>
+The **Rules (Business)** section captures the business-rule layer: each rule may generate multiple Examples, and identifying rules first prevents redundant or contradictory Examples.
 
-## Constraints
-Non-functional requirements specific to this feature (performance, security, usability, etc.).
+The **Constraints** section captures non-functional requirements. Testable constraints should become `Example:` blocks with `@id` tags.
 
-- <Constraint description>
+### Project-Level Discovery (`docs/features/discovery.md`)
+
+```markdown
+# Discovery: <project-name>
+
+## State
+Status: ELICITING | BASELINED
 
 ## Questions
 | ID | Question | Answer | Status |
 |----|----------|--------|--------|
-| Q1 | ... | ... | OPEN / ANSWERED |
+| Q1 | Who are the users? | ... | OPEN / ANSWERED |
 ```
 
-The **Rules** section captures the business-rule layer from Example Mapping: each rule may generate multiple Examples, and identifying rules first prevents redundant or contradictory Examples.
-
-The **Constraints** section captures non-functional requirements. Testable constraints should become `Example:` blocks with `@id` tags. System-wide constraints belong in the project-level `discovery.md`.
+No Entities table at project level.
diff --git a/.opencode/skills/scope/discovery-template.md b/.opencode/skills/scope/discovery-template.md
index 5079e99..04329fa 100644
--- a/.opencode/skills/scope/discovery-template.md
+++ b/.opencode/skills/scope/discovery-template.md
@@ -1,16 +1,17 @@
-# Discovery: <feature-name>
+Feature: <feature-name>
 
-## State
-Status: ELICITING
+  Discovery:
 
-## Entities
-| Type | Name | Candidate Class/Method | In Scope |
-|------|------|----------------------|----------|
+  Status: ELICITING
 
-## Rules
+  Entities:
+  | Type | Name | Candidate Class/Method | In Scope |
+  |------|------|----------------------|----------|
 
-## Constraints
+  Rules (Business):
 
-## Questions
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
+  Constraints:
+
+  Questions:
+  | ID | Question | Answer | Status |
+  |----|----------|--------|--------|
diff --git a/.opencode/skills/session-workflow/SKILL.md b/.opencode/skills/session-workflow/SKILL.md
index f7c7afd..95562a4 100644
--- a/.opencode/skills/session-workflow/SKILL.md
+++ b/.opencode/skills/session-workflow/SKILL.md
@@ -16,7 +16,7 @@ Every session starts by reading state. Every session ends by writing state. This
 1. Read `TODO.md` — find current feature, current step, and the "Next" line.
    - If `TODO.md` does not exist, run `uv run task gen-todo` to create it, then read the result.
 2. If a feature is active, read:
-   - `docs/features/in-progress/<name>/discovery.md` — feature discovery
+   - `docs/features/in-progress/<name>.feature` — feature file (discovery + architecture + Rules + Examples)
    - `docs/features/discovery.md` — project-level discovery (for context)
 3. Run `git status` — understand what is committed vs. what is not
 4. Confirm scope: you are working on exactly one step of one feature
@@ -56,7 +56,7 @@ When a step completes within a session:
 
 Feature: <name>
 Step: <1-6> (<step name>)
-Source: docs/features/in-progress/<name>/discovery.md
+Source: docs/features/in-progress/<name>.feature
 
 ## Progress
 - [x] `@id:<hex>`: <description>
@@ -68,9 +68,9 @@ Source: docs/features/in-progress/<name>/discovery.md
 ```
 
 **Source path by step:**
-- Step 1: `Source: docs/features/backlog/<name>/discovery.md`
-- Steps 2–5: `Source: docs/features/in-progress/<name>/discovery.md`
-- Step 6: `Source: docs/features/completed/<name>/discovery.md`
+- Step 1: `Source: docs/features/backlog/<name>.feature`
+- Steps 2–5: `Source: docs/features/in-progress/<name>.feature`
+- Step 6: `Source: docs/features/completed/<name>.feature`
 
 Status markers:
 - `[ ]` — not started
@@ -90,17 +90,28 @@ Next: PO picks feature from docs/features/backlog/ and moves it to docs/features
 
 During Step 4 (Implementation), TODO.md **must** include a `## Cycle State` block to track Red-Green-Refactor-Review progress. This block is **mandatory** — missing it means the cycle is unverifiable.
 
+When `Phase: SELF-DECLARE` or later, a `## Self-Declaration` block is also **mandatory**. The reviewer reads it directly from TODO.md. A missing or incomplete self-declaration (unchecked boxes, missing `file:line`) = automatic REJECTED.
+
+For the full Self-Declaration checklist template (21 items), see `implementation/SKILL.md` — the "Design Self-Declaration" section under REFACTOR.
+
 ```markdown
 # Current Work
 
 Feature: <name>
 Step: 4 (implement)
-Source: docs/features/in-progress/<name>/discovery.md
+Source: docs/features/in-progress/<name>.feature
 
 ## Cycle State
 Test: `@id:<hex>` — <description>
 Phase: RED | GREEN | REFACTOR | SELF-DECLARE | REVIEWER(code-design) | COMMITTED
 
+## Self-Declaration (@id:<hex>)
+- [x] YAGNI-1: … — `file:line`
+- [x] YAGNI-2: … — `file:line`
+- [x] KISS-1: … — `file:line`
+  … (full checklist from implementation/SKILL.md)
+- [x] Semantic: test abstraction matches AC abstraction — `file:line`
+
 ## Progress
 - [x] `@id:<hex>`: <description> — reviewer(code-design) APPROVED
 - [~] `@id:<hex>`: <description>          ← in progress (see Cycle State)
@@ -142,3 +153,4 @@ Run `gen-todo` at session start (after reading TODO.md) and at session end (befo
 5. The "Next" line must be actionable enough that a fresh AI can execute it without asking questions
 6. During Step 4, always update `## Cycle State` when transitioning between RED/GREEN/REFACTOR/SELF-DECLARE/REVIEWER phases
 7. When a step completes, update TODO.md and commit **before** any further work
+8. During Step 4, write the `## Self-Declaration (@id:<hex>)` block into TODO.md at SELF-DECLARE phase — every checkbox must be checked with a `file:line` or `N/A` before requesting reviewer(code-design)
diff --git a/.opencode/skills/session-workflow/scripts/gen_todo.py b/.opencode/skills/session-workflow/scripts/gen_todo.py
index b4b883c..980df1e 100644
--- a/.opencode/skills/session-workflow/scripts/gen_todo.py
+++ b/.opencode/skills/session-workflow/scripts/gen_todo.py
@@ -1,6 +1,6 @@
 """Generate and sync the TODO.md session bookmark from .feature files.
 
-Reads the in-progress feature folder (or backlog if no in-progress feature),
+Reads the in-progress .feature file (or reports none if not present),
 merges missing @id rows into the existing TODO.md, and writes the result.
 
 Modes:
@@ -10,7 +10,7 @@
 Merge rules:
     - Adds @id rows that are in .feature files but missing from TODO.md
     - Never removes or downgrades existing [x], [~], [-] rows
-    - Updates the Feature/Step/Source header from the in-progress folder
+    - Updates the Feature/Step/Source header from the in-progress file
     - If no feature is in-progress, writes the "No feature in progress" format
 """
 
@@ -41,53 +41,48 @@ class Criterion:
 
 
 def find_in_progress_feature() -> tuple[str, Path] | None:
-    """Find the single feature currently in docs/features/in-progress/.
+    """Find the single .feature file currently in docs/features/in-progress/.
 
     Returns:
-        Tuple of (feature_name, feature_path) or None if nothing is in progress.
+        Tuple of (feature_name, feature_file_path) or None if nothing is in progress.
+        feature_name is the .feature file stem (e.g. 'display-version').
     """
     in_progress = FEATURES_DIR / "in-progress"
     if not in_progress.exists():
         return None
-    folders = [
-        f
-        for f in in_progress.iterdir()
-        if f.is_dir() and f.name != ".gitkeep" and not f.name.startswith(".")
+    feature_files = [
+        f for f in in_progress.iterdir() if f.is_file() and f.suffix == ".feature"
     ]
-    if not folders:
+    if not feature_files:
         return None
-    return folders[0].name, folders[0]
+    feature_file = feature_files[0]
+    return feature_file.stem, feature_file
 
 
 def find_backlog_features() -> list[str]:
     """List feature names in docs/features/backlog/.
 
     Returns:
-        Sorted list of feature folder names.
+        Sorted list of .feature file stems.
     """
     backlog = FEATURES_DIR / "backlog"
     if not backlog.exists():
         return []
     return sorted(
-        f.name
-        for f in backlog.iterdir()
-        if f.is_dir() and f.name != ".gitkeep" and not f.name.startswith(".")
+        f.stem for f in backlog.iterdir() if f.is_file() and f.suffix == ".feature"
     )
 
 
 def extract_criteria(feature_path: Path) -> list[Criterion]:
-    """Extract all @id-tagged Examples from .feature files in a feature folder.
+    """Extract all @id-tagged Examples from a single .feature file.
 
     Args:
-        feature_path: Path to the feature folder.
+        feature_path: Path to the .feature file.
 
     Returns:
         Ordered list of Criterion objects (deprecated ones included).
     """
-    criteria: list[Criterion] = []
-    for feature_file in sorted(feature_path.glob("*.feature")):
-        criteria.extend(_parse_feature_file(feature_file))
-    return criteria
+    return _parse_feature_file(feature_path)
 
 
 def _parse_feature_file(path: Path) -> list[Criterion]:
@@ -340,7 +335,7 @@ def sync_todo(*, check_only: bool = False) -> int:
     step = (
         _extract_header_field(existing_text, "Step") or "? (unknown — update manually)"
     )
-    source = f"docs/features/in-progress/{feature_name}/discovery.md"
+    source = f"docs/features/in-progress/{feature_name}.feature"
     next_action = _extract_next_action(existing_text)
 
     progress_lines = build_progress_lines(criteria, existing_progress)
diff --git a/.opencode/skills/tdd/SKILL.md b/.opencode/skills/tdd/SKILL.md
index 6a33e27..fb4d5a8 100644
--- a/.opencode/skills/tdd/SKILL.md
+++ b/.opencode/skills/tdd/SKILL.md
@@ -33,6 +33,8 @@ Always run `--check` first to review planned changes before applying them.
 
 The script reads `.feature` files from `docs/features/{backlog,in-progress,completed}/` and creates/updates test files in `tests/features/<feature-name>/`.
 
+For each feature file, the script iterates over `Rule:` blocks. Each Rule maps to one test file named `<rule-slug>_test.py`. Examples within a Rule map to test functions in that file.
+
 | `.feature` state | Script action |
 |---|---|
 | New `@id` Example | Create stub with `raise NotImplementedError` |
@@ -46,26 +48,29 @@ The script reads `.feature` files from `docs/features/{backlog,in-progress,compl
 ## Test File Structure
 
 ```
-tests/features/<feature-name>/<story-slug>_test.py    ← one per .feature file
+tests/features/<feature-name>/<rule-slug>_test.py     ← one per Rule: block
 tests/unit/<anything>_test.py                          ← developer-authored extras
 ```
 
+- `<feature-name>` = the `.feature` file stem (kebab-case folder name)
+- `<rule-slug>` = the `Rule:` title slugified (hyphens replaced by underscores, lowercase)
+
 ## Test Function Naming
 
 Generated by `gen-tests`:
 
 ```python
-def test_<feature_slug>_<8char_hex>() -> None:
+def test_<rule_slug>_<8char_hex>() -> None:
 ```
 
-- `feature_slug` = feature folder name with hyphens replaced by underscores
-- `8char_hex` = the `@id` from the `.feature` file
+- `rule_slug` = the `Rule:` title with spaces and hyphens replaced by underscores, lowercased
+- `8char_hex` = the `@id` from the `Example:` block
 
 ## Docstring Format (mandatory)
 
 ```python
 @pytest.mark.unit
-def test_bounce_physics_a3f2b1c4() -> None:
+def test_wall_bounce_a3f2b1c4() -> None:
     """
     Given: A ball moving upward reaches y=0
     When: The physics engine processes the next frame
@@ -98,14 +103,25 @@ The correct test asserts on the return value. The wrong test breaks if you renam
 
 ## Test Tool Decision
 
-| Situation | Tool |
-|---|---|
-| Deterministic input/output, one scenario | Plain pytest |
-| Pure function, many input combinations | Hypothesis `@given` |
-| Stateful system with sequences of operations | Hypothesis stateful testing |
+Tests in `tests/features/` are generated from `@id` criteria — use plain pytest there.
+
+Tests in `tests/unit/` cover gaps not represented by any acceptance criterion. Any test style is valid — plain `assert` or Hypothesis `@given`. Use Hypothesis when the test covers a **property** that holds across many inputs (mathematical invariants, parsing contracts, value object constraints). Use plain pytest for specific behaviors or single edge cases discovered during refactoring.
+
+| Situation | Location | Tool |
+|---|---|---|
+| Deterministic scenario from a `.feature` `@id` | `tests/features/` | Plain pytest (generated) |
+| Property holding across many input values | `tests/unit/` | Hypothesis `@given` |
+| Specific behavior or single edge case | `tests/unit/` | Plain pytest |
+| Stateful system with sequences of operations | `tests/unit/` | Hypothesis stateful testing |
 
 **Never use Hypothesis for**: I/O, side effects, network calls, database writes.
 
+### `tests/unit/` Rules
+
+- `@pytest.mark.slow` is **mandatory** on every `@given`-decorated test (Hypothesis is genuinely slow)
+- `@example(...)` is optional but encouraged when using `@given` to document known corner cases
+- `@pytest.mark.unit` or `@pytest.mark.integration` still required (one each)
+
 ## Markers (4 total)
 
 Every test gets exactly one of:
@@ -118,7 +134,7 @@ Additionally:
 
 ```python
 @pytest.mark.unit
-def test_bounce_physics_a3f2b1c4() -> None:
+def test_wall_bounce_a3f2b1c4() -> None:
     ...
 
 @pytest.mark.integration
@@ -131,13 +147,15 @@ When in doubt, start with `unit`. Upgrade to `integration` if the implementation
 
 ## Hypothesis Tests
 
+When using `@given` in `tests/unit/`, the required decorator order is:
+
 ```python
-@pytest.mark.unit
-@pytest.mark.slow
+@pytest.mark.unit        # required: exactly one of unit or integration
+@pytest.mark.slow        # required: mandatory on all @given tests
 @given(x=st.floats(min_value=-100, max_value=100, allow_nan=False))
-@example(x=0.0)
+@example(x=0.0)          # optional: document known corner cases
 @settings(max_examples=200)
-def test_bounce_physics_c4d5e6f7(x: float) -> None:
+def test_wall_bounce_c4d5e6f7(x: float) -> None:
     """
     Given: Any floating point input value
     When: compute_distance is called
@@ -148,6 +166,8 @@ def test_bounce_physics_c4d5e6f7(x: float) -> None:
     assert result >= 0
 ```
 
+A `@given`-decorated test missing `@pytest.mark.slow` is a FAIL at Step 5 review.
+
 ### Meaningful vs. Tautological Property Tests
 
 | Tautological (useless) | Meaningful (tests the contract) |
diff --git a/.opencode/skills/tdd/scripts/gen_test_stubs.py b/.opencode/skills/tdd/scripts/gen_test_stubs.py
index 0137608..ae2c09c 100644
--- a/.opencode/skills/tdd/scripts/gen_test_stubs.py
+++ b/.opencode/skills/tdd/scripts/gen_test_stubs.py
@@ -1,8 +1,14 @@
 """Generate and sync pytest test stubs from Gherkin .feature files.
 
-Scans all feature folders under docs/features/{backlog,in-progress,completed}/
+Scans all .feature files under docs/features/{backlog,in-progress,completed}/
 and creates or updates test stubs in tests/features/<feature-name>/.
 
+Each Rule: block in a .feature file maps to one test file:
+    tests/features/<feature-name>/<rule-slug>_test.py
+
+Test function naming:
+    test_<rule_slug>_<8char_hex>()
+
 Modes:
     uv run task gen-tests              Sync all features (default)
     uv run task gen-tests -- --check   Dry run — report what would change
@@ -54,30 +60,39 @@ class GherkinExample:
     source_file: str
 
 
+@dataclass(frozen=True, slots=True)
+class RuleBlock:
+    """A Rule: block with its examples, mapped to one test file."""
+
+    rule_title: str
+    rule_slug: str
+    examples: list[GherkinExample]
+
+
 @dataclass(frozen=True, slots=True)
 class FeatureFile:
-    """A parsed .feature file with its examples."""
+    """A parsed .feature file with its Rule blocks."""
 
     path: Path
     feature_name: str
-    story_slug: str
-    examples: list[GherkinExample]
+    feature_slug: str
+    rules: list[RuleBlock]
 
 
 def slugify(name: str) -> str:
-    """Convert a feature folder name to a Python-safe slug.
+    """Convert a name to a Python-safe slug.
 
     Args:
-        name: The feature folder name (kebab-case).
+        name: Kebab-case or space-separated name.
 
     Returns:
         Underscore-separated lowercase string.
     """
-    return name.replace("-", "_").lower()
+    return re.sub(r"[^a-z0-9]+", "_", name.lower()).strip("_")
 
 
 def parse_feature_file(path: Path) -> FeatureFile | None:
-    """Parse a .feature file into structured data.
+    """Parse a .feature file into structured data with Rule blocks.
 
     Args:
         path: Path to the .feature file.
@@ -91,30 +106,81 @@ def parse_feature_file(path: Path) -> FeatureFile | None:
     if not feature or not feature.get("name"):
         return None
 
-    story_slug = path.stem
-    examples = _extract_examples(feature, str(path))
+    feature_slug = slugify(path.stem)
+    rules = _extract_rules(feature, str(path))
     return FeatureFile(
         path=path,
         feature_name=feature["name"],
-        story_slug=story_slug,
-        examples=examples,
+        feature_slug=feature_slug,
+        rules=rules,
     )
 
 
-def _extract_examples(
-    feature: dict[str, Any], source_file: str
-) -> list[GherkinExample]:
-    """Extract all Example blocks from a parsed Gherkin feature AST.
+def _extract_rules(feature: dict[str, Any], source_file: str) -> list[RuleBlock]:
+    """Extract Rule blocks from a parsed Gherkin feature AST.
+
+    Each Rule: block becomes one RuleBlock with its examples.
+    Examples not under any Rule are grouped into a synthetic rule
+    using the feature name as the slug.
 
     Args:
         feature: The 'feature' dict from gherkin-official Parser output.
         source_file: Path string for provenance tracking.
 
+    Returns:
+        List of RuleBlock objects.
+    """
+    rules: list[RuleBlock] = []
+    orphan_examples: list[GherkinExample] = []
+
+    for child in feature.get("children", []):
+        rule_node: dict[str, Any] | None = child.get("rule")
+        scenario_node: dict[str, Any] | None = child.get("scenario")
+
+        if rule_node is not None:
+            rule_title = rule_node.get("name", "")
+            rule_slug = slugify(rule_title)
+            examples = _extract_examples_from_rule(rule_node, source_file)
+            if examples:
+                rules.append(
+                    RuleBlock(
+                        rule_title=rule_title,
+                        rule_slug=rule_slug,
+                        examples=examples,
+                    )
+                )
+        elif scenario_node is not None:
+            example = _scenario_to_example(scenario_node, source_file)
+            if example is not None:
+                orphan_examples.append(example)
+
+    if orphan_examples:
+        feature_slug = slugify(feature.get("name", "feature"))
+        rules.append(
+            RuleBlock(
+                rule_title=feature.get("name", ""),
+                rule_slug=feature_slug,
+                examples=orphan_examples,
+            )
+        )
+
+    return rules
+
+
+def _extract_examples_from_rule(
+    rule_node: dict[str, Any], source_file: str
+) -> list[GherkinExample]:
+    """Extract Example blocks from a Rule node.
+
+    Args:
+        rule_node: The 'rule' dict from the Gherkin AST.
+        source_file: Path string for provenance tracking.
+
     Returns:
         List of parsed GherkinExample objects.
     """
     examples: list[GherkinExample] = []
-    for child in feature.get("children", []):
+    for child in rule_node.get("children", []):
         scenario: dict[str, Any] | None = child.get("scenario")
         if scenario is None:
             continue
@@ -194,17 +260,17 @@ def _extract_steps(steps: list[dict[str, Any]]) -> tuple[str, str, str]:
     return given, when, then
 
 
-def generate_stub(feature_slug: str, example: GherkinExample) -> str:
+def generate_stub(rule_slug: str, example: GherkinExample) -> str:
     """Generate a single test stub function.
 
     Args:
-        feature_slug: Underscored feature folder name.
+        rule_slug: Underscored rule title (used as function prefix).
         example: The parsed Gherkin example.
 
     Returns:
         Complete test function source code as a string.
     """
-    func_name = f"test_{feature_slug}_{example.id_hex}"
+    func_name = f"test_{rule_slug}_{example.id_hex}"
     markers = ["@pytest.mark.unit"]
     if example.deprecated:
         markers.append("@pytest.mark.deprecated")
@@ -244,47 +310,38 @@ def _build_docstring(example: GherkinExample) -> list[str]:
     ]
 
 
-def generate_test_file(
-    feature_slug: str, story_slug: str, examples: list[GherkinExample]
-) -> str:
-    """Generate a complete test file for one .feature file.
+def generate_test_file(rule_slug: str, examples: list[GherkinExample]) -> str:
+    """Generate a complete test file for one Rule: block.
 
     Args:
-        feature_slug: Underscored feature folder name.
-        story_slug: The story file stem (becomes test file name).
-        examples: All examples from that .feature file.
+        rule_slug: Underscored rule title (file name stem + function prefix).
+        examples: All examples from that Rule block.
 
     Returns:
         Complete test module source code.
     """
     header = (
-        f'"""Tests for {story_slug.replace("_", " ")} story."""\n\nimport pytest\n\n\n'
+        f'"""Tests for {rule_slug.replace("_", " ")} rule."""\n\nimport pytest\n\n\n'
     )
-    stubs = "\n\n".join(generate_stub(feature_slug, ex) for ex in examples)
+    stubs = "\n\n".join(generate_stub(rule_slug, ex) for ex in examples)
     return header + stubs + "\n"
 
 
-def find_feature_folders() -> dict[str, list[tuple[Path, str]]]:
-    """Find all feature folders across all stages.
+def find_feature_files() -> list[tuple[Path, str, str]]:
+    """Find all .feature files across all stages.
 
     Returns:
-        Dict mapping feature folder name to list of (feature_file_path, stage).
+        List of (feature_file_path, feature_name, stage) tuples.
+        feature_name is the .feature file stem (e.g. 'display-version').
     """
-    features: dict[str, list[tuple[Path, str]]] = {}
+    results: list[tuple[Path, str, str]] = []
     for stage in FEATURE_STAGES:
         stage_dir = FEATURES_DIR / stage
         if not stage_dir.exists():
             continue
-        for folder in sorted(stage_dir.iterdir()):
-            if not folder.is_dir():
-                continue
-            feature_files = sorted(folder.glob("*.feature"))
-            if feature_files:
-                name = folder.name
-                features.setdefault(name, [])
-                for ff in feature_files:
-                    features[name].append((ff, stage))
-    return features
+        for feature_file in sorted(stage_dir.glob("*.feature")):
+            results.append((feature_file, feature_file.stem, stage))
+    return results
 
 
 def read_existing_test_ids(test_file: Path) -> set[str]:
@@ -303,20 +360,18 @@ def read_existing_test_ids(test_file: Path) -> set[str]:
 
 
 def sync_test_file(
-    feature_slug: str,
-    story_slug: str,
+    rule_slug: str,
     examples: list[GherkinExample],
     test_file: Path,
     stage: str,
     *,
     check_only: bool = False,
 ) -> list[str]:
-    """Sync a single test file with its .feature examples.
+    """Sync a single test file with its Rule: block examples.
 
     Args:
-        feature_slug: Underscored feature folder name.
-        story_slug: The story file stem.
-        examples: Parsed examples from the .feature file.
+        rule_slug: Underscored rule title.
+        examples: Parsed examples from the Rule block.
         test_file: Path to the test file to create/update.
         stage: Feature stage (backlog, in-progress, completed).
         check_only: If True, report changes without writing.
@@ -330,7 +385,7 @@ def sync_test_file(
     if not test_file.exists():
         if stage == "completed":
             return actions
-        content = generate_test_file(feature_slug, story_slug, examples)
+        content = generate_test_file(rule_slug, examples)
         actions.append(f"CREATE {test_file} ({len(examples)} stubs)")
         if not check_only:
             test_file.parent.mkdir(parents=True, exist_ok=True)
@@ -346,7 +401,7 @@ def sync_test_file(
 
     actions.extend(
         _sync_full(
-            feature_slug,
+            rule_slug,
             examples,
             example_ids,
             existing_ids,
@@ -410,7 +465,7 @@ def _sync_deprecated_markers(
 
 
 def _sync_full(
-    feature_slug: str,
+    rule_slug: str,
     examples: list[GherkinExample],
     example_ids: set[str],
     existing_ids: set[str],
@@ -421,7 +476,7 @@ def _sync_full(
     """Full sync for backlog/in-progress features.
 
     Args:
-        feature_slug: Underscored feature folder name.
+        rule_slug: Underscored rule title.
         examples: Parsed examples.
         example_ids: Set of IDs from .feature file.
         existing_ids: Set of IDs found in existing test file.
@@ -440,11 +495,11 @@ def _sync_full(
 
     for ex in examples:
         if ex.id_hex in new_ids:
-            stub = "\n\n" + generate_stub(feature_slug, ex)
+            stub = "\n\n" + generate_stub(rule_slug, ex)
             modified += stub
             actions.append(f"ADD stub for @id:{ex.id_hex}")
         elif ex.id_hex in existing_ids:
-            modified, doc_actions = _update_docstring(modified, feature_slug, ex)
+            modified, doc_actions = _update_docstring(modified, rule_slug, ex)
             actions.extend(doc_actions)
 
     for oid in orphan_ids:
@@ -472,13 +527,13 @@ def _sync_full(
 
 
 def _update_docstring(
-    text: str, feature_slug: str, example: GherkinExample
+    text: str, rule_slug: str, example: GherkinExample
 ) -> tuple[str, list[str]]:
     """Update the docstring of an existing test to match the .feature file.
 
     Args:
         text: Full test file content.
-        feature_slug: Underscored feature folder name.
+        rule_slug: Underscored rule title.
         example: The Gherkin example to match.
 
     Returns:
@@ -507,7 +562,7 @@ def _update_docstring(
 
     old_func = re.search(rf"def (test_\w+_{example.id_hex})\b", text)
     if old_func:
-        expected_name = f"test_{feature_slug}_{example.id_hex}"
+        expected_name = f"test_{rule_slug}_{example.id_hex}"
         if old_func.group(1) != expected_name:
             text = text.replace(old_func.group(1), expected_name)
             actions.append(f"RENAME {old_func.group(1)} -> {expected_name}")
@@ -515,28 +570,32 @@ def _update_docstring(
 
 
 def find_duplicate_ids() -> list[str]:
-    """Find @id hex values that appear in more than one .feature file.
+    """Find @id hex values that appear in more than one distinct feature file.
 
-    Args:
-        None.
+    A feature that appears in multiple stage directories (backlog, in-progress,
+    completed) with the same stem is counted only once — that is expected during
+    migrations. Duplicates are only flagged when the same @id appears in two
+    different feature files (different stems).
 
     Returns:
         List of warning strings describing each duplicate @id.
     """
-    id_sources: dict[str, list[str]] = {}
-    for name, files in find_feature_folders().items():
-        for fpath, _stage in files:
-            parsed = parse_feature_file(fpath)
-            if not parsed:
-                continue
-            for ex in parsed.examples:
-                id_sources.setdefault(ex.id_hex, []).append(f"{name}/{fpath.name}")
+    id_sources: dict[str, set[str]] = {}
+    for fpath, feature_name, _stage in find_feature_files():
+        parsed = parse_feature_file(fpath)
+        if not parsed:
+            continue
+        for rule in parsed.rules:
+            for ex in rule.examples:
+                id_sources.setdefault(ex.id_hex, set()).add(
+                    f"{feature_name}/{rule.rule_slug}"
+                )
 
     warnings: list[str] = []
     for id_hex, sources in sorted(id_sources.items()):
         if len(sources) > 1:
-            locations = ", ".join(sources)
-            warnings.append(f"@id:{id_hex} appears in multiple features: {locations}")
+            locations = ", ".join(sorted(sources))
+            warnings.append(f"@id:{id_hex} appears in multiple locations: {locations}")
     return warnings
 
 
@@ -547,12 +606,11 @@ def find_orphaned_tests() -> list[str]:
         List of orphan descriptions.
     """
     all_feature_ids: set[str] = set()
-    features = find_feature_folders()
-    for _name, files in features.items():
-        for fpath, _stage in files:
-            parsed = parse_feature_file(fpath)
-            if parsed:
-                all_feature_ids.update(ex.id_hex for ex in parsed.examples)
+    for fpath, _name, _stage in find_feature_files():
+        parsed = parse_feature_file(fpath)
+        if parsed:
+            for rule in parsed.rules:
+                all_feature_ids.update(ex.id_hex for ex in rule.examples)
 
     orphans: list[str] = []
     if not TESTS_DIR.exists():
@@ -566,12 +624,12 @@ def find_orphaned_tests() -> list[str]:
 
 
 def _sync_all_features(
-    features: dict[str, list[tuple[Path, str]]], *, check_only: bool
+    feature_files: list[tuple[Path, str, str]], *, check_only: bool
 ) -> int:
-    """Sync test stubs for all feature folders.
+    """Sync test stubs for all feature files.
 
     Args:
-        features: Mapping of feature name to list of (fpath, stage) tuples.
+        feature_files: List of (fpath, feature_name, stage) tuples.
         check_only: If True, report actions without writing files.
 
     Returns:
@@ -582,19 +640,16 @@ def _sync_all_features(
         print(f"WARNING: {warning}")
 
     all_actions: list[str] = []
-    for name, files in sorted(features.items()):
-        feature_slug = slugify(name)
-        for fpath, stage in files:
-            parsed = parse_feature_file(fpath)
-            if not parsed:
-                print(f"SKIP {fpath} — no Feature: line found")
-                continue
-            story_slug = slugify(parsed.story_slug)
-            test_file = TESTS_DIR / name / f"{story_slug}_test.py"
+    for fpath, feature_name, stage in sorted(feature_files):
+        parsed = parse_feature_file(fpath)
+        if not parsed:
+            print(f"SKIP {fpath} — no Feature: line found")
+            continue
+        for rule in parsed.rules:
+            test_file = TESTS_DIR / feature_name / f"{rule.rule_slug}_test.py"
             actions = sync_test_file(
-                feature_slug,
-                story_slug,
-                parsed.examples,
+                rule.rule_slug,
+                rule.examples,
                 test_file,
                 stage,
                 check_only=check_only,
@@ -631,12 +686,12 @@ def main() -> int:
         print("No orphaned tests found.")
         return 0
 
-    features = find_feature_folders()
-    if not features:
-        print("No feature folders with .feature files found.")
+    feature_files = find_feature_files()
+    if not feature_files:
+        print("No .feature files found.")
         return 0
 
-    return _sync_all_features(features, check_only=check_only)
+    return _sync_all_features(feature_files, check_only=check_only)
 
 
 if __name__ == "__main__":
diff --git a/.opencode/skills/verify/SKILL.md b/.opencode/skills/verify/SKILL.md
index 4a9ea5d..a78ec81 100644
--- a/.opencode/skills/verify/SKILL.md
+++ b/.opencode/skills/verify/SKILL.md
@@ -18,8 +18,9 @@ This skill guides the reviewer through Step 5: independent verification that the
 ## Scope Guard — Step 4 vs. Step 5
 
 If you are invoked for a **per-test code-design check during Step 4** (not a full Step 5 review):
-- The developer will provide a completed **Design Self-Declaration** checklist with `file:line` evidence for each rule.
+- The developer's completed **Design Self-Declaration** is in the `## Self-Declaration` block of `TODO.md`. Read it first.
 - **Independently verify each claim** against the actual code using sections 4a–4e (YAGNI, KISS, DRY, SOLID, Object Calisthenics, Design Patterns) and the semantic alignment check.
+- If any item in the `## Self-Declaration` block is unchecked or has no `file:line` evidence, reject immediately — the developer has not completed the self-declaration.
 - Do **NOT** run any commands (no lint, no static-check, no test suite).
 - Respond using the verification table template in `implementation/SKILL.md` — compare developer claims vs. your independent findings for each rule.
 
@@ -33,11 +34,11 @@ After the developer signals Step 4 is complete and all self-verification checks
 
 ### 1. Read the Feature Docs
 
-Read the feature folder `docs/features/in-progress/<name>/`. Extract:
-- All `@id` tags and their Example titles from `.feature` files
+Read `docs/features/in-progress/<name>.feature`. Extract:
+- All `@id` tags and their Example titles from `Rule:` blocks
 - The interaction model (if the feature involves user interaction)
-- The developer's pre-mortem (if present in the Architecture section of `discovery.md`)
-- The Rules and Constraints sections from `discovery.md`
+- The developer's pre-mortem (if present in the Architecture section of the feature description)
+- The Rules (Business) and Constraints sections from the feature description
 
 ### 2. Check Commit History
 
@@ -105,7 +106,7 @@ Read the source files changed in this feature. **Do this before running lint/sta
 | 5 | One dot per line | Reduces coupling to transitive dependencies | `a.b.c()` chains = FAIL | | |
 | 6 | No abbreviations | Names are documentation; abbreviations lose meaning | `mgr`, `tmp`, `calc` = FAIL | | |
 | 7 | Small entities | Smaller units are easier to test, read, and replace | Functions > 20 lines or classes > 50 lines = FAIL | | |
-| 8 | ≤ 2 instance variables | Forces single responsibility through structural constraint | Count `self.x` assignments in `__init__` | | |
+| 8 | ≤ 2 instance variables | Forces responsibility splitting by making it structurally impossible to hold too much state in one class | For EVERY class: count `self.x` in `__init__`. If > 2: FAIL immediately. The only valid fix is a new named value object (OC-3) or collection class (OC-4). Invalid workarounds = FAIL: hardcoded constants, inlined literals, class-level variables, moving fields to a parent class, or merging into a dict/tuple. | | |
 | 9 | No getters/setters | Enforces tell-don't-ask; behavior lives with data | `get_x()`/`set_x()` pairs = FAIL | | |
 
 #### 4e. Design Patterns — any FAIL → REJECTED
@@ -127,7 +128,8 @@ Read the source files changed in this feature. **Do this before running lint/sta
 | No internal attribute access | Search for `_x` in assertions | None found | `_x`, `isinstance`, `type()` found | Replace with public API assertion |
 | Every `@id` has a mapped test | Match `@id` tags in `.feature` files to test functions | All mapped | Missing test | Write the missing test |
 | No `@id` used by two functions | Check for duplicate `@id` hex in test function names | None | Duplicate found | Consolidate into Hypothesis `@given` + `@example` or escalate to PO |
-| Function naming | Test names match `test_<feature_slug>_<8char_hex>` | All match | Mismatch | Rename function |
+| Function naming | Test names match `test_<rule_slug>_<8char_hex>` | All match | Mismatch | Rename function |
+| All Hypothesis tests have `@pytest.mark.slow` | Read every `@given`-decorated test for the `@slow` marker | All present | Any missing | Add `@pytest.mark.slow` |
 
 #### 4g. Code Quality — any FAIL → REJECTED
 
@@ -184,7 +186,7 @@ Record what input was given and what output was observed.
 ### @id Traceability
 | @id | Example Title | Test | Status |
 |-----|---------------|------|--------|
-| `@id:a3f2b1c4` | <title> | `tests/features/<name>/<story>_test.py::test_<slug>_a3f2b1c4` | COVERED / NOT COVERED |
+| `@id:a3f2b1c4` | <title> | `tests/features/<name>/<rule>_test.py::test_<rule_slug>_a3f2b1c4` | COVERED / NOT COVERED |
 
 ### Code Review Findings
 - PASS: <aspect>
@@ -222,3 +224,4 @@ OR
 | Duplicate `@id` in tests | 0 |
 | Empty evidence cells | 0 |
 | Orphaned tests | 0 |
+| Hypothesis tests missing `@pytest.mark.slow` | 0 |
diff --git a/AGENTS.md b/AGENTS.md
index de6a04c..15a13d2 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -5,9 +5,9 @@ A Python template to quickstart any project with a production-ready workflow, qu
 ## Workflow Overview
 
 Features flow through 6 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:
-- `docs/features/backlog/<feature-name>/` — features waiting to be worked on
-- `docs/features/in-progress/<feature-name>/` — exactly one feature being built right now
-- `docs/features/completed/<feature-name>/` — accepted and shipped features
+- `docs/features/backlog/<feature-name>.feature` — features waiting to be worked on
+- `docs/features/in-progress/<feature-name>.feature` — exactly one feature being built right now
+- `docs/features/completed/<feature-name>.feature` — accepted and shipped features
 
 ```
 STEP 1: SCOPE          (product-owner)  → discovery + Gherkin stories + criteria
@@ -55,19 +55,19 @@ STEP 6: ACCEPT         (product-owner)  → demo, validate, move folder to compl
 ## Step 1 — SCOPE (4 Phases)
 
 ### Phase 1 — Project Discovery (once per project)
-PO creates `docs/features/discovery.md`. Asks stakeholder 7 standard questions (Who/What/Why/When/Success/Failure/Out-of-scope). Silent pre-mortem generates follow-up questions. All questions presented at once. Autonomous baseline when all questions are answered. PO identifies feature list and creates `backlog/<name>/discovery.md` per feature.
+PO creates `docs/features/discovery.md`. Asks stakeholder 7 standard questions (Who/What/Why/When/Success/Failure/Out-of-scope). Silent pre-mortem generates follow-up questions. All questions presented at once. Autonomous baseline when all questions are answered. PO identifies feature list and creates one `backlog/<feature-name>.feature` file per feature (discovery section only).
 
 ### Phase 2 — Feature Discovery (per feature)
-PO derives targeted questions from feature entities: extract nouns/verbs from project discovery, populate the Entities table, then generate questions from gaps, ambiguities, and boundary conditions. Silent pre-mortem before the first interview round. Present all questions to the stakeholder at once; iterate with follow-up rounds (pre-mortem after each) until stakeholder says "baseline" to freeze discovery.
+PO derives targeted questions from feature entities: extract nouns/verbs from project discovery, populate the Entities table in the feature file description, then generate questions from gaps, ambiguities, and boundary conditions. Silent pre-mortem before the first interview round. Present all questions to the stakeholder at once; iterate with follow-up rounds (pre-mortem after each) until stakeholder says "baseline" to freeze discovery.
 
 ### Phase 3 — Stories (PO alone)
-One `.feature` file per user story. `Feature:` block with user story header only — no `Example:` blocks yet. Commit: `feat(stories): write user stories for <name>`
+One `Rule:` block per user story within the feature's `.feature` file. Each `Rule:` has the user story header (`As a / I want / So that`) as its description — no `Example:` blocks yet. Commit: `feat(stories): write user stories for <name>`
 
 ### Phase 4 — Criteria (PO alone)
-Silent pre-mortem per story. Write `Example:` blocks with `@id:<8-char-hex>` tags. Each Example must be observably distinct; if a single `.feature` file spans multiple concerns, split into separate `.feature` files (a feature folder can contain multiple `.feature` files). Commit: `feat(criteria): write acceptance criteria for <name>`
+Silent pre-mortem per Rule. Write `Example:` blocks with `@id:<8-char-hex>` tags under each `Rule:`. Each Example must be observably distinct. If a single feature spans **>2 distinct concerns** OR has **>8 candidate Examples**, split into separate `.feature` files in `backlog/` before writing Rules. Commit: `feat(criteria): write acceptance criteria for <name>`
 
 ### Feature Decomposition Threshold
-Before moving to Phase 3, check: does this feature span **>2 distinct concerns** OR have **>8 candidate Examples**? If yes, split into separate features in `backlog/` before writing stories. Each feature should address a single cohesive concern.
+Before moving to Phase 3, check: does this feature span **>2 distinct concerns** OR have **>8 candidate Examples**? If yes, split into separate `.feature` files in `backlog/` before writing Rules. Each feature file should address a single cohesive concern.
 
 **Baseline is frozen**: no `.feature` changes after criteria are written. Change = `@deprecated` tag + new Example.
 
@@ -76,40 +76,71 @@ Before moving to Phase 3, check: does this feature span **>2 distinct concerns**
 ```
 docs/features/
   discovery.md                        ← project-level (Status + Questions only)
-  backlog/<feature-name>/
-    discovery.md                      ← Status + Entities + Rules + Constraints + Questions
-    <story-slug>.feature              ← one per user story (Gherkin)
-  in-progress/<feature-name>/         ← whole folder moves here at Step 2
-  completed/<feature-name>/           ← whole folder moves here at Step 6
+  backlog/<feature-name>.feature      ← one per feature; discovery + Rules + Examples
+  in-progress/<feature-name>.feature  ← file moves here at Step 2
+  completed/<feature-name>.feature    ← file moves here at Step 6
 
 tests/
   features/<feature-name>/
-    <story-slug>_test.py              ← one per .feature, stubs from gen-tests
+    <rule-slug>_test.py               ← one per Rule: block, stubs from gen-tests
   unit/
-    <anything>_test.py                ← developer-authored extras
+    <anything>_test.py                ← developer-authored extras (no @id traceability)
 ```
 
+Tests in `tests/unit/` are developer-authored extras not covered by any `@id` criterion. Any test style is valid — plain `assert` or Hypothesis `@given`. Use Hypothesis when the test covers a **property** that holds across many inputs (mathematical invariants, parsing contracts, value object constraints). Use plain pytest for specific behaviors or single edge cases discovered during refactoring.
+
+- `@pytest.mark.slow` is mandatory on every `@given`-decorated test (Hypothesis is genuinely slow)
+- `@example(...)` is optional but encouraged when using `@given` to document known corner cases
+- No `@id` tags — tests with `@id` belong in `tests/features/`, generated by `gen-tests`
+
 ## Gherkin Format
 
 ```gherkin
 Feature: Bounce physics
-  As a game engine
-  I want balls to bounce off walls
-  So that gameplay feels physical
-
-  @id:a3f2b1c4
-  Example: Ball bounces off top wall
-    Given a ball moving upward reaches y=0
-    When the physics engine processes the next frame
-    Then the ball velocity y-component becomes positive
-
-  @deprecated @id:b5c6d7e8
-  Example: Old behavior no longer needed
-    Given ...
-    When ...
-    Then ...
+
+  Discovery:
+
+  Status: BASELINED (2026-01-10)
+
+  Entities:
+  | Type | Name | Candidate Class/Method | In Scope |
+  |------|------|----------------------|----------|
+  | Noun | Ball | Ball | Yes |
+  | Verb | Bounce | Ball.bounce() | Yes |
+
+  Rules (Business):
+  - Ball velocity reverses on wall contact
+
+  Constraints:
+  - Physics runs at 60fps
+
+  Questions:
+  | ID | Question | Answer | Status |
+  |----|----------|--------|--------|
+  | Q1 | Does gravity apply? | No, constant velocity | ANSWERED |
+
+  All questions answered. Discovery frozen.
+
+  Rule: Wall bounce
+    As a game engine
+    I want balls to bounce off walls
+    So that gameplay feels physical
+
+    @id:a3f2b1c4
+    Example: Ball bounces off top wall
+      Given a ball moving upward reaches y=0
+      When the physics engine processes the next frame
+      Then the ball velocity y-component becomes positive
+
+    @deprecated @id:b5c6d7e8
+    Example: Old behavior no longer needed
+      Given ...
+      When ...
+      Then ...
 ```
 
+- Each feature is a **single `.feature` file**; user stories are `Rule:` blocks within it
+- The feature description (free text before the first `Rule:`) contains all discovery content: Status, Entities, Rules (business), Constraints, Questions, and later Architecture
 - `@id:<8-char-hex>` — generated with `uv run task gen-id`
 - `@deprecated` — marks superseded criteria; `gen-tests` adds `@pytest.mark.deprecated` to the mapped test
 - `Example:` keyword (not `Scenario:`)
@@ -132,20 +163,20 @@ uv run task gen-tests -- --orphans # list orphaned tests
 ### Test File Layout
 
 ```
-tests/features/<feature-name>/<story-slug>_test.py
+tests/features/<feature-name>/<rule-slug>_test.py
 ```
 
 ### Function Naming
 
 ```python
-def test_<feature_slug>_<8char_hex>() -> None:
+def test_<rule_slug>_<8char_hex>() -> None:
 ```
 
 ### Docstring Format (mandatory)
 
 ```python
 @pytest.mark.unit
-def test_bounce_physics_a3f2b1c4() -> None:
+def test_wall_bounce_a3f2b1c4() -> None:
     """
     Given: A ball moving upward reaches y=0
     When: The physics engine processes the next frame
@@ -262,7 +293,7 @@ Every session: load `skill session-workflow`. Read `TODO.md` first, update it at
 
 Feature: <name>
 Step: <1-6> (<step name>)
-Source: docs/features/in-progress/<name>/discovery.md
+Source: docs/features/in-progress/<name>.feature
 
 ## Progress
 - [x] `<@id:hex>`: <description>          ← done
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3fdca57..570f818 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,27 @@
 
 All notable changes to this template will be documented in this file.
 
+## [v4.1.20260416] - Recursive Acinonyx - 2026-04-16
+
+### Added
+- **Single `.feature` file per feature**: Each feature is now one `.feature` file with `Rule:` blocks for user stories and `Example:` blocks for ACs — discovery content embedded in the feature description free text; replaces the folder-per-feature structure
+- **Rule-scoped test files**: `gen_test_stubs.py` rewritten to parse `Rule:` blocks; each Rule maps to one test file (`<rule-slug>_test.py`); function naming is now `test_<rule_slug>_<id_hex>()`
+- **Hypothesis-only `tests/unit/`**: Every test in `tests/unit/` must use `@given`; `@pytest.mark.slow` is mandatory on all Hypothesis tests; plain `assert` tests without `@given` are forbidden
+- **Mandatory `## Self-Declaration` in TODO.md**: Developer writes the 21-item checklist into a `## Self-Declaration (@id:<hex>)` block in `TODO.md` at `SELF-DECLARE` phase before requesting reviewer check (Rule 8 in session-workflow)
+
+### Changed
+- **`gen_test_stubs.py`**: Scans `docs/features/{backlog,in-progress,completed}/*.feature` directly (not subfolders); generates one test file per `Rule:` block
+- **`gen_todo.py`**: `find_in_progress_feature()` now finds `.feature` files directly in `in-progress/`; source path is `docs/features/in-progress/<name>.feature`
+- **`skills/tdd/SKILL.md`**: Test Tool Decision table updated to separate `tests/features/` (plain pytest, generated) from `tests/unit/` (Hypothesis only); `tests/unit/` rules section added
+- **`skills/implementation/SKILL.md`**: Unit test rule tightened — `@given` required, `@pytest.mark.slow` mandatory, plain tests forbidden
+- **`skills/verify/SKILL.md`**: Two new rows in section 4f: `@given` check and `@slow` check; two new rows in Standards Summary
+- **`skills/scope/SKILL.md`**: All four phases rewritten for file-based workflow; `discovery-template.md` converted to `.feature` file template
+- **`skills/session-workflow/SKILL.md`**: Step 4 TODO format updated with mandatory `## Self-Declaration` block template; Rule 8 added
+- **Completed feature migrated**: `docs/features/completed/display-version/` (three files) merged into `docs/features/completed/display-version.feature` (single file with two `Rule:` blocks)
+
+### Fixed
+- **OC-8 clarification**: The only valid fix for > 2 `self.x` is a new named class (Rule 3 or Rule 4); hardcoded constants, class-level variables, inlined literals, and parent-class moves are all invalid workarounds and remain FAIL
+
 ## [v4.0.20260416] - Precise Tarsius - 2026-04-16
 
 ### Added
diff --git a/docs/features/completed/display-version.feature b/docs/features/completed/display-version.feature
new file mode 100644
index 0000000..e390000
--- /dev/null
+++ b/docs/features/completed/display-version.feature
@@ -0,0 +1,76 @@
+Feature: Display version
+
+  Discovery:
+
+  Status: COMPLETED
+
+  Entities:
+  | Type | Name             | Candidate Class/Method      | In Scope |
+  |------|------------------|-----------------------------|----------|
+  | Noun | Version string   | version()                   | Yes      |
+  | Noun | pyproject.toml   | (source of truth)           | Yes      |
+  | Noun | Log output       | logging                     | Yes      |
+  | Noun | Verbosity level  | ValidVerbosity              | Yes      |
+  | Noun | Entry point      | main()                      | Yes      |
+  | Verb | Retrieve         | version()                   | Yes      |
+  | Verb | Display / Log    | main()                      | Yes      |
+  | Verb | Configure        | ValidVerbosity              | Yes      |
+  | Verb | Validate         | main() raises ValueError    | Yes      |
+
+  Rules (Business):
+  - Version is read from pyproject.toml at runtime using tomllib
+  - Log verbosity is controlled by a ValidVerbosity parameter passed to main()
+  - Valid verbosity levels are: DEBUG, INFO, WARNING, ERROR, CRITICAL
+  - An invalid verbosity value raises a ValueError with the invalid value and valid options
+  - The version string is logged at INFO level; visible at DEBUG and INFO, not at WARNING+
+
+  Constraints:
+  - No hardcoded __version__ constant — pyproject.toml is the single source of truth
+  - Entry point: app/__main__.py (main(verbosity) function)
+  - Version logic: app/version.py (version() function)
+
+  Questions:
+  | ID | Question | Answer | Status |
+  |----|----------|--------|--------|
+
+  All questions answered. Discovery frozen.
+
+  Rule: Version retrieval
+    As a developer
+    I want to retrieve the application version programmatically
+    So that I can display or log it at runtime
+
+    @id:3f2a1b4c
+    Example: Version string is read from pyproject.toml
+      Given pyproject.toml exists with a version field
+      When version() is called
+      Then the returned string matches the version in pyproject.toml
+
+    @id:7a8b9c0d
+    Example: Version call emits an INFO log message
+      Given pyproject.toml exists with a version field
+      When version() is called
+      Then an INFO log message in the format "Version: <version>" is emitted
+
+  Rule: Verbosity control
+    As a developer
+    I want to control log verbosity via a parameter
+    So that I can tune output for different environments
+
+    @id:a1b2c3d4
+    Example: Version appears in logs at DEBUG and INFO verbosity
+      Given a verbosity level of DEBUG or INFO is passed to main()
+      When main() is called
+      Then the version string appears in the log output
+
+    @id:b2c3d4e5
+    Example: Version is absent from logs at WARNING and above
+      Given a verbosity level of WARNING, ERROR, or CRITICAL is passed to main()
+      When main() is called
+      Then the version string does not appear in the log output
+
+    @id:e5f6a7b8
+    Example: Invalid verbosity raises a descriptive error
+      Given an invalid verbosity string is passed to main()
+      When main() is called
+      Then a ValueError is raised with the invalid value and valid options listed
diff --git a/docs/features/completed/display-version/discovery.md b/docs/features/completed/display-version/discovery.md
deleted file mode 100644
index 3fc335c..0000000
--- a/docs/features/completed/display-version/discovery.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Feature Discovery: display-version
-
-## Status
-completed
-
-## Entities
-
-**Nouns**: version string, pyproject.toml, log output, verbosity level, entry point  
-**Verbs**: retrieve, display, log, configure, validate
-
-## Rules
-- Version is read from `pyproject.toml` at runtime using `tomllib`
-- Log verbosity is controlled by a `ValidVerbosity` parameter passed to `main()`
-- Valid verbosity levels are: DEBUG, INFO, WARNING, ERROR, CRITICAL
-- An invalid verbosity value raises a `ValueError` with the invalid value and the list of valid options
-- The version string is logged at INFO level; it is visible at DEBUG and INFO but not at WARNING or above
-
-## Constraints
-- No hardcoded `__version__` constant — `pyproject.toml` is the single source of truth
-- Entry point: `app/__main__.py` (`main(verbosity)` function)
-- Version logic: `app/version.py` (`version()` function)
-
-## Questions
-All questions answered. Discovery frozen.
diff --git a/docs/features/completed/display-version/verbosity-control.feature b/docs/features/completed/display-version/verbosity-control.feature
deleted file mode 100644
index 4a16c05..0000000
--- a/docs/features/completed/display-version/verbosity-control.feature
+++ /dev/null
@@ -1,22 +0,0 @@
-Feature: Verbosity control
-  As a developer
-  I want to control log verbosity via a parameter
-  So that I can tune output for different environments
-
-  @id:a1b2c3d4
-  Example: Version appears in logs at DEBUG and INFO verbosity
-    Given a verbosity level of DEBUG or INFO is passed to main()
-    When main() is called
-    Then the version string appears in the log output
-
-  @id:b2c3d4e5
-  Example: Version is absent from logs at WARNING and above
-    Given a verbosity level of WARNING, ERROR, or CRITICAL is passed to main()
-    When main() is called
-    Then the version string does not appear in the log output
-
-  @id:e5f6a7b8
-  Example: Invalid verbosity raises a descriptive error
-    Given an invalid verbosity string is passed to main()
-    When main() is called
-    Then a ValueError is raised with the invalid value and valid options listed
diff --git a/docs/features/completed/display-version/version-retrieval.feature b/docs/features/completed/display-version/version-retrieval.feature
deleted file mode 100644
index 9150195..0000000
--- a/docs/features/completed/display-version/version-retrieval.feature
+++ /dev/null
@@ -1,16 +0,0 @@
-Feature: Version retrieval
-  As a developer
-  I want to retrieve the application version programmatically
-  So that I can display or log it at runtime
-
-  @id:3f2a1b4c
-  Example: Version string is read from pyproject.toml
-    Given pyproject.toml exists with a version field
-    When version() is called
-    Then the returned string matches the version in pyproject.toml
-
-  @id:7a8b9c0d
-  Example: Version call emits an INFO log message
-    Given pyproject.toml exists with a version field
-    When version() is called
-    Then an INFO log message in the format "Version: <version>" is emitted
diff --git a/docs/workflow.md b/docs/workflow.md
new file mode 100644
index 0000000..6d952ed
--- /dev/null
+++ b/docs/workflow.md
@@ -0,0 +1,239 @@
+# Development Workflow
+
+This document describes the complete feature lifecycle used to develop software with this framework.
+
+---
+
+## Overview
+
+Features flow through 6 steps with a WIP limit of 1 feature at a time. The filesystem enforces the limit:
+
+```
+docs/features/backlog/<name>.feature      ← waiting
+docs/features/in-progress/<name>.feature  ← exactly one being built
+docs/features/completed/<name>.feature    ← accepted and shipped
+```
+
+Each step has a designated agent and a specific deliverable. No step is skipped.
+
+---
+
+## Full Workflow Diagram
+
+```
+╔══════════════════════════════════════════════════════════════════════╗
+║                    FEATURE LIFECYCLE (WIP = 1)                       ║
+╚══════════════════════════════════════════════════════════════════════╝
+
+  FILESYSTEM ENFORCES WIP:
+  backlog/<name>.feature  →  in-progress/<name>.feature  →  completed/<name>.feature
+
+
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 1 — SCOPE                              agent: product-owner   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  Phase 1 — Project Discovery (once per project)                     │
+│    PO asks stakeholder 7 questions → silent pre-mortem              │
+│    → baseline → create backlog/<name>.feature stubs                 │
+│                                                                     │
+│  Phase 2 — Feature Discovery (per feature)                          │
+│    PO populates Entities table → generates questions from gaps      │
+│    → interview rounds → stakeholder says "baseline"                 │
+│    → decomposition check (>2 concerns or >8 examples → split)       │
+│                                                                     │
+│  Phase 3 — Stories (PO alone)                                       │
+│    Write Rule: blocks with user story headers (no Examples yet)     │
+│    commit: feat(stories): write user stories for <name>             │
+│                                                                     │
+│  Phase 4 — Criteria (PO alone)                                      │
+│    Write @id-tagged Example: blocks under each Rule:                │
+│    commit: feat(criteria): write acceptance criteria for <name>     │
+│    ★ FROZEN — changes require @deprecated + new Example             │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+                              ↓  PO picks feature from backlog
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 2 — ARCHITECTURE                           agent: developer   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  mv backlog/<name>.feature → in-progress/<name>.feature             │
+│  Read discovery + feature file                                      │
+│  Silent pre-mortem (YAGNI/KISS/DRY/SOLID/OC/patterns)              │
+│  Append Architecture section to feature file description            │
+│    (Module Structure + ADRs + Build Changes)                        │
+│  Architecture contradiction check → PO acknowledges                │
+│  commit: feat(<name>): add architecture                             │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 3 — TEST FIRST                             agent: developer   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  uv run task gen-tests   →  creates tests/features/<name>/          │
+│                              one <rule-slug>_test.py per Rule:      │
+│                              test_<rule_slug>_<hex>() per Example   │
+│  Write test bodies (real assertions, not raise NotImplementedError) │
+│  Confirm every test FAILS (ImportError / AssertionError)            │
+│  ★ STOP — reviewer checks test design + semantic alignment          │
+│  ★ WAIT for APPROVED                                                │
+│  commit: test(<name>): write failing tests                          │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 4 — IMPLEMENT                              agent: developer   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  For each failing test (one at a time):                             │
+│                                                                     │
+│    RED → GREEN → REFACTOR → SELF-DECLARE ─STOP─ REVIEWER ─WAIT─   │
+│                                                          ↓ APPROVED │
+│                                                       COMMIT        │
+│                                                          ↓          │
+│                                                    next test        │
+│                                                                     │
+│  RED:         confirm test fails                                     │
+│  GREEN:       minimum code to pass (YAGNI + KISS only)              │
+│  REFACTOR:    DRY → SOLID → Object Calisthenics (9 rules)           │
+│               → type hints → docstrings                             │
+│  SELF-DECLARE: write ## Self-Declaration block in TODO.md           │
+│               21-item checklist with file:line evidence             │
+│  REVIEWER:    code-design check only (no lint/pyright/coverage)     │
+│  COMMIT:      feat(<name>): implement <what>                        │
+│                                                                     │
+│  After all tests green:                                             │
+│    lint + static-check + test + timeout run  (all must pass)        │
+│    developer pre-mortem (2-3 sentences)                             │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 5 — VERIFY                                  agent: reviewer   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  Default hypothesis: broken despite green checks                    │
+│                                                                     │
+│  1. Read feature file — all @id Examples, interaction model         │
+│  2. Check commit history — one commit per test, clean status        │
+│  3. Production-grade gate:                                          │
+│       app exits cleanly + output changes with input                 │
+│  4. Code review (stop on first failure):                            │
+│       4a Correctness (dead code, DRY, YAGNI)                        │
+│       4b KISS (one thing, nesting, size)                            │
+│       4c SOLID (5-row table)                                        │
+│       4d Object Calisthenics (9-row table)                          │
+│       4e Design Patterns (5 smells)                                 │
+│       4f Tests (docstrings, contracts, @id coverage, naming)        │
+│       4g Code Quality (noqa, type hints, docstrings, coverage)      │
+│  5. Run: gen-tests --orphans → lint → static-check → test           │
+│  6. Interactive verification (if UI involved)                       │
+│  7. Written report: APPROVED or REJECTED                            │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+                              ↓ APPROVED
+┌─────────────────────────────────────────────────────────────────────┐
+│  STEP 6 — ACCEPT                             agent: product-owner   │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  PO runs/observes the feature (real user interaction)               │
+│  Checks against original Rule: user stories                         │
+│                                                                     │
+│  ACCEPTED:                                                          │
+│    mv in-progress/<name>.feature → completed/<name>.feature         │
+│    developer creates PR + tags release                              │
+│                                                                     │
+│  REJECTED:                                                          │
+│    feedback in TODO.md → back to relevant step                      │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Supporting Tools
+
+| Command | When | Purpose |
+|---|---|---|
+| `uv run task gen-tests` | Step 3, Step 4 | Reads `.feature` files → creates/syncs test stubs in `tests/features/` |
+| `uv run task gen-tests -- --check` | Before gen-tests | Dry run — preview what would change |
+| `uv run task gen-tests -- --orphans` | Step 5 | List tests with no matching `@id` |
+| `uv run task gen-todo` | Every session | Reads in-progress `.feature` → syncs `TODO.md` |
+| `uv run task gen-id` | Step 1 Phase 4 | Generate 8-char hex `@id` for a new Example |
+| `uv run task test-fast` | Step 4 cycle | Fast test run (no coverage) — used during Red-Green-Refactor |
+| `uv run task test` | Handoff, Step 5 | Full suite with coverage — must reach 100% |
+| `uv run task lint` | Handoff, Step 5 | ruff — must exit 0 |
+| `uv run task static-check` | Handoff, Step 5 | pyright — must exit 0, 0 errors |
+| `timeout 10s uv run task run` | Handoff, Step 5 | App must exit cleanly (exit 124 = hang = fix it) |
+
+---
+
+## Test Layout
+
+```
+tests/
+  features/<feature-name>/
+    <rule-slug>_test.py     ← generated by gen-tests, one per Rule: block
+                              function: test_<rule_slug>_<8char_hex>()
+  unit/
+    <anything>_test.py      ← developer-authored extras, no @id traceability
+                              plain pytest or Hypothesis @given (developer's choice)
+```
+
+---
+
+## TODO.md Structure (Step 4)
+
+```markdown
+# Current Work
+
+Feature: <name>
+Step: 4 (implement)
+Source: docs/features/in-progress/<name>.feature
+
+## Cycle State
+Test: @id:<hex> — <description>
+Phase: RED | GREEN | REFACTOR | SELF-DECLARE | REVIEWER(code-design) | COMMITTED
+
+## Self-Declaration (@id:<hex>)
+- [x] YAGNI-1 … SOLID-D … OC-1…OC-9 … Semantic  (21 items, file:line each)
+
+## Progress
+- [x] @id:<hex>: <done> — reviewer(code-design) APPROVED
+- [~] @id:<hex>: <in progress>
+- [ ] @id:<hex>: <next>
+
+## Next
+<one actionable sentence>
+```
+
+---
+
+## Roles
+
+| Role | Type | Responsibilities |
+|---|---|---|
+| **Stakeholder** | Human | Answers questions, provides domain knowledge, says "baseline" |
+| **Product Owner** | AI agent | Interviews stakeholder, writes `.feature` files, picks features, accepts deliveries |
+| **Developer** | AI agent | Architecture, tests, code, git, releases |
+| **Reviewer** | AI agent | Adversarial verification — defaults to REJECTED until proven correct |
+
+---
+
+## Quality Gates (non-negotiable)
+
+| Gate | Standard |
+|---|---|
+| Test coverage | 100% |
+| Type errors (pyright) | 0 |
+| Lint errors (ruff) | 0 |
+| Function length | ≤ 20 lines |
+| Class length | ≤ 50 lines |
+| Max nesting | 2 levels |
+| Instance variables per class | ≤ 2 |
+| Uncovered `@id` tags | 0 |
+| `noqa` comments | 0 |
+| `type: ignore` comments | 0 |
+| Orphaned tests | 0 |
+| Hypothesis tests missing `@pytest.mark.slow` | 0 |
diff --git a/pyproject.toml b/pyproject.toml
index 7958c61..6658085 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "python-project-template"
-version = "4.0.20260416"
+version = "4.1.20260416"
 description = "Python template with some awesome tools to quickstart any Python project"
 readme = "README.md"
 requires-python = ">=3.13"
diff --git a/uv.lock b/uv.lock
index e2344e8..e95d401 100644
--- a/uv.lock
+++ b/uv.lock
@@ -735,7 +735,7 @@ wheels = [
 
 [[package]]
 name = "python-project-template"
-version = "4.0.20260416"
+version = "4.1.20260416"
 source = { virtual = "." }
 dependencies = [
     { name = "fire" },