Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,8 @@ deepwork/
│ │ └── deepplan/
│ ├── standard_schemas/ # Built-in DeepSchema definitions
│ ├── review/ # DeepWork Reviews system (.deepreview pipeline)
│ ├── schemas/ # Definition schemas (deepreview, deepschema, doc_spec)
│ ├── schemas/ # Definition schemas (deepreview, deepschema, doc_spec, tool_requirements)
│ ├── tool_requirements/ # Tool requirements policy enforcement (config, discovery, matcher, evaluator, engine, cache, sidecar)
│ └── utils/ # Utilities (fs, git, yaml, validation)
├── platform/ # Shared platform-agnostic content
│ └── skill-body.md # Canonical skill body (source of truth)
Expand All @@ -227,7 +228,7 @@ deepwork/
│ │ │ ├── new_user/SKILL.md
│ │ │ ├── record/SKILL.md
│ │ │ └── review/SKILL.md
│ │ ├── hooks/ # hooks.json, post_commit_reminder.sh, post_compact.sh, startup_context.sh, deepschema_write.sh
│ │ ├── hooks/ # hooks.json, post_commit_reminder.sh, post_compact.sh, startup_context.sh, deepschema_write.sh, tool_requirements.sh
│ │ └── .mcp.json # MCP server config
│ └── gemini/ # Gemini CLI extension
│ └── skills/deepwork/SKILL.md
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Tool requirements policy enforcement system: define RFC 2119-style rules in `.deepwork/tool_requirements/*.yml` to govern AI agent tool calls, with LLM-based semantic evaluation, appeal mechanism, and 1-hour TTL caching

### Changed

- Renamed default reviewer agent from `reviewer` to `deepwork:reviewer` (plugin-namespaced) in review instructions output
- `/review` skill now checks for `deepwork:reviewer` agent availability before proceeding and directs users to `/reload-plugins` if missing

### Fixed

- Settings schema missing `if`, `asyncRewake`, `once`, `shell` fields on hook definitions and missing `StopFailure`, `PermissionDenied`, `TaskCreated`, `FileChanged`, `CwdChanged` hook event types

### Removed
## [0.13.7] - 2026-04-14

Expand Down
13 changes: 11 additions & 2 deletions doc/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,16 @@ deepwork/ # DeepWork tool repository
│ ├── schemas/ # Definition schemas
│ │ ├── deepreview_schema.json
│ │ ├── deepschema_schema.json
│ │ └── doc_spec_schema.py
│ │ ├── doc_spec_schema.py
│ │ └── tool_requirements_schema.json
│ ├── tool_requirements/ # Tool requirements policy enforcement
│ │ ├── cache.py # In-memory TTL cache for approved calls
│ │ ├── config.py # ToolPolicy/Requirement dataclasses, parser
│ │ ├── discovery.py # Load policies from .deepwork/tool_requirements/
│ │ ├── engine.py # Check + appeal orchestration
│ │ ├── evaluator.py # LLM evaluator (Haiku) for requirement checking
│ │ ├── matcher.py # Match policies to tool calls
│ │ └── sidecar.py # HTTP sidecar server for hook communication
│ └── utils/
│ ├── fs.py
│ ├── git.py
Expand All @@ -119,7 +128,7 @@ deepwork/ # DeepWork tool repository
│ │ │ ├── new_user/SKILL.md
│ │ │ ├── record/SKILL.md
│ │ │ └── review/SKILL.md
│ │ ├── hooks/ # hooks.json, post_commit_reminder.sh, post_compact.sh, startup_context.sh, deepschema_write.sh
│ │ ├── hooks/ # hooks.json, post_commit_reminder.sh, post_compact.sh, startup_context.sh, deepschema_write.sh, tool_requirements.sh
│ │ └── .mcp.json # MCP server config
│ └── gemini/ # Gemini CLI extension
│ └── skills/deepwork/SKILL.md
Expand Down
26 changes: 25 additions & 1 deletion doc/mcp_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This document describes the Model Context Protocol (MCP) tools exposed by the De

## Tools

DeepWork exposes eleven MCP tools:
DeepWork exposes twelve MCP tools:

### 1. `get_workflows`

Expand Down Expand Up @@ -308,6 +308,29 @@ Retrieve the YAML content of a session-scoped job definition previously register
}
```

### 12. `appeal_tool_requirement`

Appeal a tool requirement policy denial. When a tool call is blocked by a tool requirement policy, call this to appeal specific failed checks by providing justifications. Some checks are marked `no_exception` and cannot be appealed. If the appeal succeeds, the tool call is cached as approved and you can retry the original tool call.

#### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `tool_name` | `string` | Yes | The normalized tool name that was blocked |
| `tool_input` | `dict` | Yes | The exact tool_input that was blocked |
| `policy_justification` | `dict[string, string]` | Yes | Map of failed check names to justification strings |
| `session_id` | `string` | No | Session identifier (CLAUDE_CODE_SESSION_ID on Claude Code) |

#### Returns

```typescript
{
passed: boolean; // Whether the appeal succeeded
reason: string; // Explanation of result
no_exception_blocked?: string[]; // Checks that cannot be appealed
}
```

---

## Shared Types
Expand Down Expand Up @@ -491,6 +514,7 @@ Add to your `.mcp.json`:

| Version | Changes |
|---------|---------|
| 2.4.0 | Added `appeal_tool_requirement` tool for appealing tool requirement policy denials with justifications. |
| 2.3.0 | Added `project_root` field to `ActiveStepInfo` — the absolute path to the MCP server's project root. Added `register_session_job` and `get_session_job` tools for transient session-scoped job definitions. Session jobs are discoverable by `start_workflow` via `session_id` lookup — they take priority over standard discovery. Added `deepplan` standard job with `create_deep_plan` workflow. |
| 2.2.0 | `session_id` is now optional (`str | None`) on `start_workflow` only. On Claude Code (platform `"claude"`), the server raises `ToolError` if omitted. On other platforms, omitting it auto-generates a stable UUID; callers use the returned `begin_step.session_id` for all subsequent calls. `finished_step`, `abort_workflow`, and `go_to_step` continue to require `session_id`. Added `inputs` optional parameter to `start_workflow` for passing step argument values directly at workflow start. Added `issue_detected` optional field to all tool responses — present when the server detects configuration issues at startup; instructs agent to suggest repair to the user. |
| 2.1.0 | Added `important_note` field to `StartWorkflowResponse` — instructs agents to clarify ambiguous user requests via `AskUserQuestion` when available. |
Expand Down
104 changes: 104 additions & 0 deletions doc/specs/deepwork/DW-REQ-012-tool-requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# DW-REQ-012: Tool Requirements Policy Enforcement

## Overview

The tool requirements system enforces RFC 2119-style policies on AI agent tool calls. Users define rules in `.deepwork/tool_requirements/*.yml` files. A PreToolUse hook checks these rules via an HTTP sidecar server using LLM-based semantic evaluation. Failed checks can be appealed via an MCP tool. Approved calls are cached with a TTL.

## DW-REQ-012.1: Policy File Format

1. Policy files MUST be YAML files located in `.deepwork/tool_requirements/` with a `.yml` extension.
2. Each policy file MUST be validated against the `tool_requirements_schema.json` JSON Schema.
3. The `tools` field MUST be a non-empty array of normalized tool names (e.g., `shell`, `write_file`, `edit_file`) or MCP tool names (e.g., `mcp__server__tool`).
4. The `requirements` field MUST be a mapping of requirement identifiers to objects containing a `rule` string (RFC 2119 statement) and an optional `no_exception` boolean (default: `false`).
5. The `match` field MAY be a mapping of tool_input parameter names to regex patterns for parameter-level filtering.
6. The `extends` field MAY be an array of policy file stems for inheritance.
7. The `summary` field MAY be a human-readable description of the policy.

## DW-REQ-012.2: Policy Discovery

1. The system MUST scan `.deepwork/tool_requirements/` for `*.yml` files (single-directory, no tree walk).
2. Files that fail to parse MUST be skipped with a warning logged — they MUST NOT prevent other policies from loading.
3. If the `.deepwork/tool_requirements/` directory does not exist, the system MUST return an empty policy list without error.

## DW-REQ-012.3: Policy Inheritance

1. When a policy lists `extends`, the system MUST merge parent requirements into the child.
2. Child requirements MUST override parent requirements with the same key.
3. Unknown parent names MUST be logged as warnings and skipped — they MUST NOT cause errors.
4. Circular inheritance MUST be detected and MUST NOT cause infinite loops.
5. Diamond inheritance (two parents sharing a common ancestor) MUST be handled correctly — the common ancestor's requirements MUST be included once.

## DW-REQ-012.4: Policy Matching

1. A policy MUST match a tool call if the tool's normalized name is in the policy's `tools` list.
2. If the policy has a `match` dict, the policy MUST match only when at least one parameter regex matches a value in `tool_input` (via `re.search`).
3. If the policy has no `match` dict, it MUST match all calls to the listed tools.
4. Multiple policies MAY match a single tool call; all matched requirements MUST be merged.
5. If the same requirement key appears in multiple matched policies, the first occurrence MUST win.
6. Invalid regex patterns in `match` MUST be skipped without error.

## DW-REQ-012.5: Requirement Evaluation

1. Requirements MUST be evaluated by an LLM evaluator (Haiku by default) that considers RFC 2119 keywords.
2. `MUST`/`MUST NOT` violations MUST always result in failure.
3. `SHOULD`/`SHOULD NOT` violations MUST result in failure only when the tool call could be trivially modified to comply (e.g., adding a flag, choosing a different command) — the evaluator prompt MUST instruct the LLM to apply this criterion.
4. `MAY` requirements MUST always pass.
5. The evaluator MUST return a verdict for every requirement — requirements not evaluated MUST fail closed.
6. The evaluator MUST be encapsulated behind an abstract interface to allow implementation swapping.
7. Large `tool_input` values MUST be truncated to avoid exceeding LLM token limits.

## DW-REQ-012.6: Check Flow

1. When a tool call is checked, the system MUST first check the cache — if approved, it MUST allow immediately.
2. If no policies match the tool call, it MUST be allowed.
3. If evaluation passes all requirements, the result MUST be cached and the call MUST be allowed.
4. If any requirements fail, the response MUST include ALL failures (not one at a time).
5. Each failure MUST include the requirement ID and an explanation.
6. `no_exception` requirements MUST be labeled as such in the failure message.
7. The failure message MUST include instructions for how to appeal via the `appeal_tool_requirement` MCP tool.

## DW-REQ-012.7: Appeal Mechanism

1. The system MUST provide an `appeal_tool_requirement` MCP tool.
2. The tool MUST accept `tool_name`, `tool_input`, and `policy_justification` (a dict mapping failed check IDs to justification strings).
3. `no_exception` requirements MUST NOT be appealable — appeals for them MUST be rejected immediately.
4. For appealable requirements, the evaluator MUST re-evaluate considering the provided justifications.
5. If the appeal succeeds, the result MUST be cached so the retried tool call passes the hook.
6. If the appeal fails, the response MUST list all still-failing requirements.

## DW-REQ-012.8: Caching

1. Approved tool calls MUST be cached with a 1-hour TTL.
2. The cache key MUST be deterministic, derived from the tool name and tool input.
3. Expired cache entries MUST be evicted on lookup.
4. The cache MUST be in-memory within the sidecar server process.

## DW-REQ-012.9: PreToolUse Hook

1. The hook MUST fire on all PreToolUse events (empty matcher).
2. The hook MUST skip the `appeal_tool_requirement` MCP tool to prevent infinite loops (substring match on raw tool name).
3. If the sidecar is unreachable (port file missing or PID dead), the hook MUST deny with an error message instructing the user to restart the MCP server (fail-closed).
4. If communication with the sidecar fails, the hook MUST deny with an error message (fail-closed).
5. The hook MUST use `hookSpecificOutput.permissionDecision: "deny"` format for Claude Code PreToolUse events.
6. The hook MUST use the cross-platform wrapper system (`run_hook`, `HookInput`, `HookOutput`).

## DW-REQ-012.10: Sidecar HTTP Server

1. The sidecar MUST start as a daemon thread alongside the MCP server when policy files exist.
2. The sidecar MUST bind to `127.0.0.1` on a random port.
3. The sidecar MUST write a port file to `.deepwork/tmp/tool_req_sidecar/<PID>.json` containing `{"pid": <PID>, "port": <PORT>}`.
4. The sidecar MUST provide `POST /check` and `POST /appeal` endpoints.
5. The sidecar MUST clean up its port file and any session mapping files on exit.
6. Session IDs used in filenames MUST be validated against `^[a-zA-Z0-9_-]+$` to prevent path traversal.

## DW-REQ-012.11: Multi-Instance Support

1. When the first MCP tool call arrives with a `session_id`, the server MUST write a session mapping file at `.deepwork/tmp/tool_req_sidecar/session_<SESSION_ID>.json`.
2. The hook MUST look for a session-specific mapping file first, then fall back to scanning PID-keyed port files for live processes.
3. Stale port files (PID no longer alive) MUST be cleaned up during discovery.

## DW-REQ-012.12: Sidecar Startup Gating

1. The sidecar MUST NOT start if no `.deepwork/tool_requirements/` directory exists.
2. The sidecar MUST NOT start if the directory contains no `*.yml` files.
3. If sidecar startup fails, the MCP server MUST continue running — the failure MUST be logged as a warning.
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,9 @@ The Claude Code plugin is the primary distribution mechanism for DeepWork on the
4. The agent body MUST instruct the subagent to read the instruction file from the user prompt, perform the review against the criteria in that file, and call `mark_review_as_passed` to report results.
5. The agent body MUST instruct the subagent not to edit files and not to explore beyond what the review instructions direct.
6. When the review formatter renders tasks with no per-rule agent persona specified (`agent_name` is `None`), it MUST default to `"reviewer"` as the `subagent_type` (see REVIEW-REQ-006.3.3c).

### PLUG-REQ-001.15: Tool Requirements PreToolUse Hook

1. The plugin MUST register a PreToolUse hook in `plugins/claude/hooks/hooks.json` with an empty matcher (matches all tool calls).
2. The hook MUST delegate to `deepwork hook tool_requirements` via a shell wrapper at `plugins/claude/hooks/tool_requirements.sh`.
3. The hook MUST skip the `appeal_tool_requirement` MCP tool to prevent infinite loops (see DW-REQ-012.9).
11 changes: 11 additions & 0 deletions plugins/claude/hooks/hooks.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
{
"hooks": {
"PreToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/tool_requirements.sh"
}
]
}
],
"SessionStart": [
{
"matcher": "",
Expand Down
15 changes: 15 additions & 0 deletions plugins/claude/hooks/tool_requirements.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash
# tool_requirements.sh - PreToolUse hook for tool requirements enforcement
#
# Fires before every tool call. Delegates to the Python hook which contacts
# the MCP sidecar to check policies.
#
# Input (stdin): JSON from Claude Code PreToolUse hook
# Output (stdout): JSON with hookSpecificOutput.permissionDecision
# Exit codes:
# 0 - Always (decision encoded in JSON output)

INPUT=$(cat)
export DEEPWORK_HOOK_PLATFORM="claude"
echo "${INPUT}" | deepwork hook tool_requirements
exit $?
23 changes: 23 additions & 0 deletions src/deepwork/cli/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ def _serve_mcp(
"# Ignore everything in this directory\n*\n# But keep this .gitignore\n!.gitignore\n"
)

# Start tool requirements sidecar (if policies exist)
_start_tool_requirements_sidecar(project_path)

# Create and run server
from deepwork.jobs.mcp.server import create_server

Expand All @@ -135,3 +138,23 @@ def _serve_mcp(
server.run(transport="stdio")
else:
server.run(transport="sse", port=port)


def _start_tool_requirements_sidecar(project_path: Path) -> None:
"""Start the tool requirements sidecar if policy files exist."""
policy_dir = project_path / ".deepwork" / "tool_requirements"
if not policy_dir.is_dir():
return
if not any(policy_dir.glob("*.yml")):
return

try:
from deepwork.tool_requirements.sidecar import start_sidecar

start_sidecar(project_path)
except Exception:
import logging

logging.getLogger("deepwork.tool_requirements").warning(
"Failed to start tool requirements sidecar", exc_info=True
)
1 change: 1 addition & 0 deletions src/deepwork/hooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ pytest tests/shell_script_tests/test_hook_wrappers.py -v
| `wrapper.py` | Cross-platform input/output normalization |
| `deepschema_write.py` | DeepSchema write-time validation hook |
| `post_commit_reminder.py` | Post-commit hook that nudges the agent to run `/review` (skips if all reviews already passed) |
| `tool_requirements.py` | PreToolUse hook for tool requirements policy enforcement |
| `claude_hook.sh` | Shell wrapper for Claude Code |
| `gemini_hook.sh` | Shell wrapper for Gemini CLI |
| `.deepreview` | Review rule ensuring hooks use correct output routing (DW-REQ-006.6) |
Loading
Loading