Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 37 additions & 81 deletions .deepreview
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ prompt_best_practices:
- "plugins/**/skills/**/*.md"
- "learning_agents/skills/**/*.md" # learning_agents plugin skills are prompt-heavy
- "learning_agents/agents/**/*.md" # agent persona definitions
- "plugins/**/agents/**/*.md" # plugin agent definitions are prompt-heavy
- "platform/**/*.md"
- "src/deepwork/standard_jobs/**/*.md"
- "library/jobs/**/*.md" # library job step instructions are prompt-heavy files
Expand Down Expand Up @@ -118,96 +119,47 @@ requirements_traceability:
instructions: |
Review the changed files for requirements traceability.

This project keeps formal requirements in `doc/specs/` organized by domain.
Each file follows the naming pattern `{PREFIX}-REQ-NNN-<topic>.md` where
PREFIX is one of: DW-REQ, JOBS-REQ, REVIEW-REQ, LA-REQ, PLUG-REQ.
Requirements are individually numbered (e.g. JOBS-REQ-004.1). Requirements
must be validated by either automated tests OR DeepWork review rules.
Requirements live in `doc/specs/` as `{PREFIX}-REQ-NNN-<topic>.md`
(prefixes: DW-REQ, JOBS-REQ, REVIEW-REQ, LA-REQ, PLUG-REQ), with
individually numbered items (e.g. JOBS-REQ-004.1). Each requirement
must be validated by automated tests, DeepSchemas, or `.deepreview` rules.

## Choosing the right validation mechanism

Choosing the right mechanism is critical. The wrong choice creates
false confidence (a passing test that doesn't actually verify anything)
or wastes reviewer judgment on something a machine can check exactly.

**Use anonymous DeepSchemas** (`.deepschema.<filename>.yml`) when
requirements target a specific file — whether structural or semantic:
- "This config file MUST include a timeout field" — structural check
for one file (use `json_schema_path` or `verification_bash_command`
for exact verification)
- "The learn workflow MUST accept X and Y step arguments" — the
requirement governs a specific YAML file's content
- "Skill MUST instruct the agent to do X" — judgment-based check
of prose in one specific file
- "The error message MUST include a suggestion for how to fix the
problem" — governs a specific source file's behavior

Anonymous DeepSchemas provide both write-time validation and review-time
checks, and they keep the requirement co-located with the file it governs.
**Prefer them over both tests and `.deepreview` rules whenever the
requirement targets a specific file** rather than a class of files.
DeepSchemas can enforce structural requirements via `json_schema_path`
or `verification_bash_command` just as precisely as a test, while also
supporting judgment-based requirements in the same schema.

**Use automated tests** (`tests/`) when the requirement specifies a
concrete, machine-verifiable fact that spans multiple files or is not
tied to a single file's content:
- File A is byte-identical to file B
- A Python function returns the correct value for given inputs
- A CLI command produces expected output
- A data structure assembled from multiple sources has a required shape

Tests reference requirement IDs via docstrings and traceability comments.

**Use `.deepreview` rules** when evaluating the requirement requires
judgment AND applies broadly across many files of a type:
- "All prompts MUST use the terms X, Y, and Z" — a general standard
that applies to every file matching a glob pattern
- "Code MUST follow pattern Y" — does the implementation match the
spirit of the pattern across multiple files?
- "Documentation MUST stay in sync with code" — are the descriptions
still accurate after changes?

Both `.deepreview` rules and DeepSchemas reference requirement IDs in
their `description`, `instructions`, or `requirements` fields.
Pick the mechanism that matches the requirement type. The wrong choice
creates false confidence or wastes reviewer judgment.

**Anonymous DeepSchemas** (`.deepschema.<filename>.yml`): when the
requirement targets a specific file (structural or semantic). Use
`json_schema_path` / `verification_bash_command` for exact checks,
or judgment-based criteria for prose. Prefer DeepSchemas over tests
and `.deepreview` rules for single-file requirements.

**Automated tests** (`tests/`): for concrete, machine-verifiable facts
spanning multiple files (function return values, CLI output, cross-file
structure). Tests reference requirement IDs via docstrings/comments.

**`.deepreview` rules**: when evaluation requires judgment AND applies
broadly across many files of a type (coding standards, documentation
accuracy, prompt conventions). Rules and DeepSchemas reference
requirement IDs in `description`, `instructions`, or `requirements`.

## Anti-patterns to flag

**Fragile keyword tests for judgment-based requirements.** A test that
checks `"reuse" in content.lower()` to validate "MUST instruct the
agent to reuse existing rules" is not deterministic verification — it
is a keyword search pretending to be one. The word "reuse" could appear
in an unrelated sentence, be negated ("do NOT reuse"), or be absent
while the instruction clearly conveys reuse through other wording.
These requirements need a review rule that can read and evaluate the
instruction's meaning. Other examples of this anti-pattern:
- `"parallel" in content` for "MUST launch tasks in parallel"
- `"again" in content or "repeat" in content` for "MUST re-run after changes"
- `"without asking" in content` for "MUST automatically apply obvious fixes"

**Review rules for machine-verifiable requirements.** A review rule
that asks a reviewer "check whether the config file contains
`--platform claude`" is wasting reviewer judgment on something
`assert "--platform" in args` can verify exactly. If the requirement
specifies a concrete value, path, or structure, use a test.

See doc/specs/validating_requirements_with_rules.md for more information.
- **Fragile keyword tests for judgment requirements**: e.g.
`"parallel" in content` for "MUST launch tasks in parallel" — use
a review rule instead. See doc/specs/validating_requirements_with_rules.md.
- **Review rules for machine-verifiable requirements**: e.g. asking a
reviewer to check for a specific flag — use a test instead.

## Review checklist

1. Check that any new or changed end-user functionality has a
corresponding requirement in `doc/specs/`.
2. Check that every requirement touched by this change has at least
one automated test OR at least one `.deepreview` rule validating
it. **Verify the mechanism matches the requirement type** — flag
keyword-search tests used for judgment requirements, and flag
review rules used for machine-verifiable requirements.
3. Flag any test modifications where the underlying requirement did
not also change.
4. For rule-validated requirements, verify the `.deepreview` rule's
description or instructions reference the requirement ID and that
the rule's scope covers the requirement's intent.
1. New/changed end-user functionality has a requirement in `doc/specs/`.
2. Every touched requirement has a test, DeepSchema, or `.deepreview`
rule. Verify the mechanism matches the requirement type.
3. Flag test modifications where the underlying requirement didn't change.
4. For rule-validated requirements, verify the rule references the
requirement ID and its scope covers the requirement's intent.

Produce a structured review with Coverage Gaps, Test Stability
Violations, and a Summary with PASS/FAIL verdicts.
Expand Down Expand Up @@ -369,6 +321,10 @@ shell_code_review:
section markers) still accurate after the changes? Flag any comments
that describe behavior that no longer matches the code.

Output Format:
- PASS: No issues found.
- FAIL: List each issue with file, line, severity (high/medium/low), and a concise description.

agents_md_claude_md_symlink:
description: "Ensure every AGENTS.md file has a sibling CLAUDE.md symlink pointing to it, because Claude Code reads CLAUDE.md but ignores AGENTS.md."
match:
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Renamed default reviewer agent from `reviewer` to `deepwork:reviewer` (plugin-namespaced) in review instructions output
- `/review` skill now checks for `deepwork:reviewer` agent availability before proceeding and directs users to `/reload-plugins` if missing

### Fixed

### Removed
Expand Down
18 changes: 8 additions & 10 deletions README_REVIEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,20 +305,18 @@ This:
### What the output looks like

```
Invoke the following list of Tasks in parallel.
Invoke the following list of Agents in parallel.
IMPORTANT: Do NOT read the prompt files yourself. Pass the prompt field directly to each agent — the @file references are expanded automatically.

name: "python_review review of src/app.py"
description: Review python_review
subagent_type: reviewer
description: Review python_review
subagent_type: deepwork:reviewer
prompt: "@.deepwork/tmp/review_instructions/7142141.md"

name: "python_review review of src/lib.py"
description: Review python_review
subagent_type: reviewer
description: Review python_review
subagent_type: deepwork:reviewer
prompt: "@.deepwork/tmp/review_instructions/6316224.md"

name: "db_migration_safety review of 2 files"
description: Review db_migration_safety
description: Review db_migration_safety
subagent_type: db-expert
prompt: "@.deepwork/tmp/review_instructions/3847291.md"
```
Expand Down Expand Up @@ -535,6 +533,6 @@ Patterns follow standard glob syntax, evaluated relative to the `.deepreview` fi

## Contributor setup

By default, `/review` dispatches each review task to the `reviewer` subagent shipped with the DeepWork Claude plugin (`plugins/claude/agents/reviewer.md`). If you are developing against this repo with only the dev MCP server (`uv run deepwork serve`) and no plugin installed, Claude Code cannot resolve `subagent_type: reviewer` and review dispatch will fail.
By default, `/review` dispatches each review task to the `deepwork:reviewer` subagent shipped with the DeepWork Claude plugin (`plugins/claude/agents/reviewer.md`). If you are developing against this repo with only the dev MCP server (`uv run deepwork serve`) and no plugin installed, Claude Code cannot resolve `subagent_type: deepwork:reviewer` and review dispatch will fail.

To run reviews as a contributor, install the plugin alongside the dev server: `claude plugin marketplace add Unsupervisedcom/deepwork && claude plugin install deepwork@deepwork-plugins`. The plugin ships the reviewer agent file, and either MCP server prefix (`mcp__deepwork-dev__*` or `mcp__plugin_deepwork_deepwork__*`) will resolve the reviewer's tools.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The `deepwork review` CLI command orchestrates the full DeepWork Reviews pipelin
3. For each review task, the output MUST include fields matching the Claude Code `Agent` tool parameters:
a. ~~DEPRECATED~~ ~~A `name` field formatted as `"{scope_prefix}{rule_name} review of {file_or_scope}"`.~~
b. A `description` field with a short (3-5 word) summary for the task (e.g., `"Review {rule_name}"`). When the rule comes from a `.deepreview` in a subdirectory, the description MUST include the scope (e.g., `"Review my_job/{rule_name}"`).
c. A `subagent_type` field set to the agent persona name (from the rule's `agent.claude` value) or `"reviewer"` if no persona is specified. `"reviewer"` refers to the default reviewer subagent shipped by the DeepWork Claude plugin (`plugins/claude/agents/reviewer.md`).
c. A `subagent_type` field set to the agent persona name (from the rule's `agent.claude` value) or `"deepwork:reviewer"` if no persona is specified. `"deepwork:reviewer"` refers to the default reviewer subagent shipped by the DeepWork Claude plugin (`plugins/claude/agents/reviewer.md`), using the plugin-namespaced agent name.
d. A `prompt` field referencing the instruction file path relative to the project root, prefixed with `@` (e.g., `@.deepwork/tmp/review_instructions/7142141.md`).
4. The instruction file paths MUST be relative to the project root.
5. When running inside a git worktree, the formatter MUST resolve `@file` paths relative to the main working tree root (not the worktree root), because Claude Code expands `@file` references relative to the main repo root. The main repo root MUST be detected via `git rev-parse --path-format=absolute --git-common-dir`. If git is unavailable or the directory is not a worktree, the formatter MUST fall back to using the project root.
Expand Down
1 change: 1 addition & 0 deletions doc/specs/deepwork/review/REVIEW-REQ-007-plugin-skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ DeepWork Reviews is delivered to users via the Claude Code plugin. The plugin sh
7. The skill MUST instruct the agent to use AskUserQuestion for findings that involve trade-offs or subjective judgment.
8. The skill MUST instruct the agent to re-run the review after making changes, repeating until no further actionable findings remain.
9. The skill MUST route configuration requests (creating or modifying `.deepreview` files) to the `configure_reviews` skill.
10. The skill MUST verify that the `deepwork:reviewer` agent is available before running reviews. If the agent is not available, the skill MUST stop and instruct the user to run `/reload-plugins` to pick up the latest plugin updates.

### REVIEW-REQ-007.2: Configure Reviews Skill

Expand Down
2 changes: 1 addition & 1 deletion plugins/claude/agents/reviewer.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: reviewer
name: deepwork:reviewer
description: Minimal review subagent for DeepWork review tasks. Reads a supplied instruction file, performs the review against the criteria in that file, and reports results via the DeepWork MCP mark_review_as_passed tool. Use when dispatching parallel review tasks from .deepreview rules or workflow quality gates.
model: sonnet
color: cyan
Expand Down
8 changes: 8 additions & 0 deletions plugins/claude/skills/review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ Run automated code reviews on the current branch based on `.deepreview` config f

Only proceed past this section if the user wants to **run** reviews.

## Pre-flight — Verify Agent Availability

Before running reviews, check that the `deepwork:reviewer` agent is available. If it does not appear in the agents list (i.e., the Agent tool does not list `deepwork:reviewer` as a valid `subagent_type`), **STOP** and tell the user:

> The `deepwork:reviewer` agent is not available. Please run `/reload-plugins` to pick up the latest plugin updates, then try again.

Do not proceed with reviews until the agent is available.

## How to Run

1. Call the `mcp__deepwork__get_review_instructions` tool directly:
Expand Down
2 changes: 1 addition & 1 deletion src/deepwork/review/formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ def format_for_claude(
rel_path = file_path

description = _task_description(task)
subagent_type = task.agent_name or "reviewer"
subagent_type = task.agent_name or "deepwork:reviewer"

lines.append(f"description: {description}")
lines.append(f"\tsubagent_type: {subagent_type}")
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/review/test_formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def test_default_subagent_type_when_no_agent(self, tmp_path: Path) -> None:
task = _make_task(agent_name=None)
file_path = tmp_path / "instructions.md"
result = format_for_claude([(task, file_path)], tmp_path)
assert "subagent_type: reviewer" in result
assert "subagent_type: deepwork:reviewer" in result

# THIS TEST VALIDATES A HARD REQUIREMENT (REVIEW-REQ-006.3.3c).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
Expand Down
Loading