nullhack
diff --git a/‎.opencode/agents/developer.md‎
Lines changed: 23 additions & 16 deletions b/‎.opencode/agents/developer.md‎
Lines changed: 23 additions & 16 deletions
diff --git a/‎.opencode/agents/product-owner.md‎
Lines changed: 18 additions & 3 deletions b/‎.opencode/agents/product-owner.md‎
Lines changed: 18 additions & 3 deletions
diff --git a/‎.opencode/agents/reviewer.md‎
Lines changed: 9 additions & 9 deletions b/‎.opencode/agents/reviewer.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎.opencode/skills/extend-criteria/SKILL.md‎
Lines changed: 102 additions & 0 deletions b/‎.opencode/skills/extend-criteria/SKILL.md‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎.opencode/skills/implementation/SKILL.md‎
Lines changed: 12 additions & 10 deletions b/‎.opencode/skills/implementation/SKILL.md‎
Lines changed: 12 additions & 10 deletions
@@ -34,10 +34,16 @@ You build everything: architecture, tests, code, and releases. You own technical
 Every session: load `skill session-workflow` first. Read TODO.md to find current step and feature.
 
 ### Step 2 — BOOTSTRAP + ARCHITECTURE
-When a new feature lands in `docs/features/in-progress/`:
-
-1. Read the feature doc. Understand all acceptance criteria and their UUIDs.
-2. Add an `## Architecture` section to the feature doc:
+When a new feature is ready in `docs/features/backlog/`:
+
+1. Move the feature doc to in-progress:
+   ```bash
+   mv docs/features/backlog/<feature-name>.md docs/features/in-progress/<feature-name>.md
+   git add -A
+   git commit -m "chore(workflow): start <feature-name>"
+   ```
+2. Read the feature doc. Understand all acceptance criteria and their UUIDs.
+3. Add an `## Architecture` section to the feature doc:
    - Module structure (which files you will create/modify)
    - Key decisions — write an ADR for any non-obvious choice:
      ```
@@ -47,10 +53,10 @@ When a new feature lands in `docs/features/in-progress/`:
      Alternatives considered: <what you rejected and why>
      ```
    - Build changes that need PO approval: new runtime deps, new packages, changed entry points
-3. If build changes need PO approval, ask before proceeding. Tooling changes (coverage, lint rules, test config) are your autonomy.
-4. Update `pyproject.toml` and project structure as needed.
-5. Run `task test` — must still pass.
-6. Commit: `feat(bootstrap): configure build for <feature-name>`
+4. If build changes need PO approval, ask before proceeding. Tooling changes (coverage, lint rules, test config) are your autonomy.
+5. Update `pyproject.toml` and project structure as needed.
+6. Run `uv run task test` — must still pass.
+7. Commit: `feat(bootstrap): configure build for <feature-name>`
 
 ### Step 3 — TEST FIRST
 Load `skill tdd`. Write failing tests mapped 1:1 to each UUID acceptance criterion.
@@ -59,7 +65,8 @@ Commit: `test(<feature-name>): add failing tests for all acceptance criteria`
 ### Step 4 — IMPLEMENT
 Load `skill implementation`. Make tests green one at a time.
 Commit after each test goes green: `feat(<feature-name>): implement <component>`
-Self-verify: run `task test` and `timeout 10s task run` after each commit.
+Self-verify after each commit: run all four commands in the Self-Verification block below.
+If you discover a missing behavior during implementation, load `skill extend-criteria`.
 
 ### After reviewer approves (Step 5)
 Load `skill pr-management` and `skill git-release` as needed.
@@ -100,10 +107,10 @@ When making a non-obvious architecture decision, write a brief ADR in the featur
 
 Before declaring any step complete and before requesting reviewer verification, run:
 ```bash
-task lint                # must exit 0
-task static-check        # must exit 0, 0 errors
-task test                # must exit 0, all tests pass
-timeout 10s task run     # must exit 0 or 124; exit 124 = timeout (infinite loop) = fix it
+uv run task lint                # must exit 0
+uv run task static-check        # must exit 0, 0 errors
+uv run task test                # must exit 0, all tests pass
+timeout 10s uv run task run     # must exit non-124; exit 124 = timeout (infinite loop) = fix it
 ```
 
 Do not hand off broken work to the reviewer.
@@ -112,9 +119,8 @@ Do not hand off broken work to the reviewer.
 
 ```
 <package>/             # production code (named after the project)
-tests/
-  unit/                # @pytest.mark.unit — isolated, one function/class
-  integration/         # @pytest.mark.integration — multiple components
+tests/                 # flat layout — no unit/ or integration/ subdirectories
+  <name>_test.py       # marker (@pytest.mark.unit/integration) determines category
 pyproject.toml         # version, deps, tasks, test config
 ```
 
@@ -127,6 +133,7 @@ pyproject.toml         # version, deps, tasks, test config
 - `session-workflow` — read/update TODO.md at session boundaries
 - `tdd` — write failing tests with UUID traceability (Step 3)
 - `implementation` — Red-Green-Refactor cycle (Step 4)
+- `extend-criteria` — add gap criteria discovered during implementation or review
 - `code-quality` — ruff, pyright, coverage standards
 - `pr-management` — create PRs with conventional commits
 - `git-release` — calver versioning and themed release naming
 
@@ -53,28 +53,43 @@ Every criterion must have a UUID (generate with `python -c "import uuid; print(u
 
 ```markdown
 - `<uuid>`: <Short description>.
+  Source: <stakeholder | po | developer | reviewer | bug>
+
   Given: <precondition>
   When: <action>
   Then: <expected outcome>
-  Test strategy: unit | integration
 ```
 
 All UUIDs must be unique. Every story must have at least one criterion. Every criterion must be independently testable.
 
+**Source field** (mandatory): records who originated this criterion.
+- `stakeholder` — an external stakeholder gave this requirement to the PO
+- `po` — the PO originated this criterion independently
+- `developer` — a gap found during Step 4 implementation
+- `reviewer` — a gap found during Step 5 verification
+- `bug` — a post-merge regression; the feature doc was reopened
+
+When adding criteria discovered after initial scope, load `skill extend-criteria`.
+
 ## Feature Document Structure
 
+Filename: `<verb>-<object>.md` — imperative verb first, kebab-case, 2–4 words.
+Examples: `display-version.md`, `authenticate-user.md`, `export-metrics-csv.md`
+Title matches: `# Feature: <Verb> <Object>` in Title Case.
+
 ```markdown
-# Feature: <Name>
+# Feature: <Verb> <Object>
 
 ## User Stories
 - As a <role>, I want <goal> so that <benefit>
 
 ## Acceptance Criteria
 - `<uuid>`: <Short description>.
+  Source: <stakeholder | po>
+
   Given: ...
   When: ...
   Then: ...
-  Test strategy: unit | integration
 
 ## Notes
 <constraints, risks, out-of-scope items>
 
@@ -55,10 +55,10 @@ Load `skill verify`. Run all commands, check all criteria, produce a written rep
 Run these in order. If any fails, stop and report — do not continue to the next:
 
 ```bash
-task lint                # must exit 0
-task static-check        # must exit 0, 0 errors
-task test                # must exit 0, 0 failures, coverage >= 100%
-timeout 10s task run     # must exit 0 or 124; exit 124 = timeout (infinite loop) = FAIL
+uv run task lint                # must exit 0
+uv run task static-check        # must exit 0, 0 errors
+uv run task test                # must exit 0, 0 failures, coverage >= 100%
+timeout 10s uv run task run     # must exit non-124; exit 124 = timeout (infinite loop) = FAIL
 ```
 
 ## Code Review Checklist
@@ -95,7 +95,7 @@ After all commands pass, review source code for:
 9. No getters/setters (tell, don't ask)
 
 **Tests**
-- [ ] Every test has UUID docstring with Given/When/Then
+- [ ] Every test has UUID-only first line docstring, blank line, then Given/When/Then
 - [ ] Tests assert behavior, not structure
 - [ ] Every acceptance criterion has a mapped test
 - [ ] No test verifies isinstance, type(), or internal attributes
@@ -111,10 +111,10 @@ After all commands pass, review source code for:
 ## Step 5 Verification Report
 
 ### Commands
-- task lint: PASS | FAIL — <output if fail>
-- task static-check: PASS | FAIL — <errors if fail>
-- task test: PASS | FAIL — <failures/coverage if fail>
-- timeout 10s task run: PASS | FAIL | TIMEOUT — <error or "process did not exit within 10s" if fail>
+- uv run task lint: PASS | FAIL — <output if fail>
+- uv run task static-check: PASS | FAIL | NOT RUN — <errors if fail, or "stopped after previous failure">
+- uv run task test: PASS | FAIL | NOT RUN — <failures/coverage if fail, or "stopped after previous failure">
+- timeout 10s uv run task run: PASS | FAIL | TIMEOUT | NOT RUN — <error or "process did not exit within 10s" if fail, or "stopped after previous failure">
 
 ### Code Review
 - PASS | FAIL: <finding with file:line reference>
 
@@ -0,0 +1,102 @@
+---
+name: extend-criteria
+description: Add acceptance criteria discovered after scope is written — gaps found during implementation or review, and post-merge defects
+version: "1.0"
+author: any
+audience: developer, reviewer, product-owner
+workflow: feature-lifecycle
+---
+
+# Extend Criteria
+
+This skill is loaded when any agent discovers a missing behavior that is not covered by the existing acceptance criteria. It provides the decision rule, UUID assignment, and commit protocol for adding new criteria mid-flight or post-merge.
+
+## When to Use
+
+- **Developer (Step 4)**: implementation reveals an untested behavior
+- **Reviewer (Step 5)**: code review reveals an observable behavior with no acceptance criterion
+- **Post-merge**: a defect is found in production and a regression criterion must be added
+
+Do not use this skill to scope new features. New observable behaviors that go beyond the current feature's user stories must be escalated to the PO.
+
+## Decision Rule: Is This a Gap or a New Feature?
+
+Ask: "Does this behavior fall within the intent of the current user stories?"
+
+| Situation | Action |
+|---|---|
+| Edge case or error path within approved scope | Add criterion with `Source: developer` or `Source: reviewer` |
+| New observable behavior users did not ask for | Escalate to PO; do not add criterion unilaterally |
+| Post-merge regression (the feature was accepted and broke later) | Reopen feature doc; add criterion with `Source: bug` |
+| Behavior already present but criterion was never written | Add criterion with appropriate `Source:` |
+
+When in doubt, ask the PO before adding.
+
+## Criterion Format
+
+All criteria use this format (mandatory `Source:` field):
+
+```markdown
+- `<uuid>`: <Short description ending with a period>.
+  Source: <source>
+
+  Given: <precondition>
+  When: <action>
+  Then: <single observable outcome>
+```
+
+**Source values** (choose exactly one):
+- `stakeholder` — an external stakeholder gave this requirement to the PO
+- `po` — the PO originated this criterion independently
+- `developer` — a gap found during Step 4 implementation
+- `reviewer` — a gap found during Step 5 verification
+- `bug` — a post-merge regression; the feature doc was reopened
+
+**Rules**:
+- UUID must be unique across the entire project
+- Generate: `python -c "import uuid; print(uuid.uuid4())"`
+- `Then` must be a single observable, measurable outcome — no "and"
+- Do not add `Source:` retroactively to criteria that predate this field
+
+## Procedure by Role
+
+### Developer (Step 4)
+
+1. Determine whether this is a gap within scope or a new feature (use the decision table above)
+2. If it is within scope:
+   a. Add the criterion to the feature doc with `Source: developer`
+   b. Write the failing test for it (load `skill tdd`)
+   c. Make it green (continue Red-Green-Refactor)
+   d. Commit: `test(<feature-name>): add gap criterion <uuid>`
+3. If it is out of scope: write a note in TODO.md under `## Next`, flag it for the PO after Step 5
+
+### Reviewer (Step 5)
+
+1. Determine whether this is a gap within scope or a new feature
+2. If it is within scope:
+   - Add the criterion to the feature doc with `Source: reviewer`
+   - Record in the REJECTED report: "Added criterion `<uuid>` — developer must implement before resubmitting"
+3. If it is out of scope:
+   - Do not add the criterion
+   - Note it in the report as a future backlog item
+
+### Post-merge Defect
+
+1. Move the feature doc back to in-progress:
+   ```bash
+   mv docs/features/completed/<name>.md docs/features/in-progress/<name>.md
+   git add -A
+   git commit -m "chore(workflow): reopen <name> for bug fix"
+   ```
+2. Add the new criterion with `Source: bug`
+3. Return to Step 3 (write failing test) then Step 4 (implement) then Step 5 (verify) then Step 6 (accept)
+4. Update TODO.md to reflect the reopened feature at the correct step
+
+## Checklist
+
+Before committing a new criterion:
+- [ ] UUID is unique (search: `grep -r "<uuid>" docs/features/` and `grep -r "<uuid>" tests/`)
+- [ ] `Source:` value is one of the five valid values
+- [ ] `Then` is a single, observable outcome (no "and")
+- [ ] Blank line between `Source:` line and `Given:`
+- [ ] A corresponding test will be written (or already exists)
@@ -102,7 +102,7 @@ def register_user(email: EmailAddress, repo: UserRepository) -> "User":
 ## RED — Confirm the Test Fails
 
 ```bash
-pytest tests/unit/<file>_test.py::test_<name> -v
+uv run pytest tests/<file>_test.py::test_<name> -v
 ```
 
 Expected: `FAILED` or `ERROR`. If it passes before you've written code, the test is wrong — fix it.
@@ -116,8 +116,8 @@ Write the least code that makes the test pass. Apply during GREEN:
 Do NOT apply during GREEN: DRY, SOLID, Object Calisthenics — those come in refactor.
 
 ```bash
-pytest tests/unit/<file>_test.py::test_<name> -v   # must be PASSED
-task test                                            # must all still pass
+uv run pytest tests/<file>_test.py::test_<name> -v   # must be PASSED
+uv run task test                                      # must all still pass
 ```
 
 ## REFACTOR — Apply Principles (in priority order)
@@ -137,10 +137,12 @@ task test                                            # must all still pass
 4. **Type hints**: add/fix type annotations on all public functions and classes
 5. **Docstrings**: Google-style on all public functions and classes
 
+> **Note**: `uv run task test` runs `--doctest-modules`, which executes code examples embedded in source docstrings. Keep `Examples:` blocks in Google-style docstrings valid and executable. If an example should not be run, mark it with `# doctest: +SKIP`.
+
 ```bash
-task test          # must still pass
-task lint          # must exit 0
-task static-check  # must exit 0
+uv run task test          # must still pass
+uv run task lint          # must exit 0
+uv run task static-check  # must exit 0
 ```
 
 ## COMMIT
@@ -157,10 +159,10 @@ Then move to the next failing test.
 After all tests are green, before telling the reviewer you are ready:
 
 ```bash
-task lint                # exit 0
-task static-check        # exit 0, 0 errors
-task test                # exit 0, all pass, coverage 100%
-timeout 10s task run     # exit 0 or non-124; exit 124 = hung process = fix it
+uv run task lint                # exit 0
+uv run task static-check        # exit 0, 0 errors
+uv run task test                # exit 0, all pass, coverage 100%
+timeout 10s uv run task run     # exit non-124; exit 124 = hung process = fix it
 ```
 
 All four must pass. Do not hand off broken work.