feat: add tool requirements policy enforcement system by nhorton · Pull Request #383 · Unsupervisedcom/deepwork

nhorton · 2026-04-15T18:31:09Z

Summary

Adds a new policy enforcement system for AI agent tool calls via .deepwork/tool_requirements/*.yml files with RFC 2119-style requirements
PreToolUse hook checks policies via an HTTP sidecar server (spawned alongside the MCP server) using Haiku for semantic evaluation
Failed checks can be appealed via a new appeal_tool_requirement MCP tool; approvals are cached with a 1-hour TTL
Supports no_exception rules (cannot be appealed), policy inheritance via extends, and parameter-level regex matching
Fail-closed design: hook denies all tool calls if the MCP sidecar is unreachable
Loop prevention: the appeal tool itself is exempt from policy checking

Test plan

62 new unit tests covering config parsing, discovery, matching, cache, evaluator, engine, and hook
Full test suite passes (1338 tests, 0 failures)
ruff clean
mypy clean
Manual test: create a .deepwork/tool_requirements/test.yml, start MCP server, trigger a tool call that violates a policy
Verify fail-closed behavior when MCP server is stopped
Verify appeal flow caches approval and subsequent tool call succeeds

🤖 Generated with Claude Code

Introduces a PreToolUse hook-based policy system that evaluates tool calls against RFC 2119-style requirements defined in .deepwork/tool_requirements/*.yml. Policies are checked via an HTTP sidecar server (spawned alongside the MCP server) using Haiku for semantic evaluation. Failed checks can be appealed via a new appeal_tool_requirement MCP tool. Approvals are cached with a 1-hour TTL. Key features: - Policy files with tools, match (param regex), requirements, extends (inheritance) - no_exception rules that cannot be appealed - Fail-closed: hook denies if MCP sidecar is unreachable - Loop prevention: appeal tool calls skip the hook - Multi-instance support via PID-keyed + session-keyed port files - Evaluator encapsulated behind ABC for future swap to direct API calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- engine.py: rename loop variable `f` to `failure` for clarity - sidecar.py: move `import asyncio` to module level, fix event loop leak with try/finally, fix inaccurate comment, add session_id validation - evaluator.py: change `continue` to `break` on raw JSON array parse, filter non-dict items in _extract_json_array - discovery.py: fix double-name warning message, remove dead code - test_engine.py: add type hints to MockEvaluator.evaluate, remove unused imports - test_tool_requirements_hook.py: remove redundant test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- doc/mcp_interface.md: add appeal_tool_requirement as tool #12, bump count - doc/architecture.md: add tool_requirements/ package and hook to structure - CLAUDE.md: add tool_requirements/ and hook to project structure appendix - src/deepwork/hooks/README.md: add tool_requirements.py to files table - CHANGELOG.md: add tool requirements feature to Unreleased Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- evaluator.py: fix comment accuracy, extract _filter_dicts to reduce DRY - discovery.py: fix diamond inheritance by copying visited set per parent - test_engine.py: remove redundant @pytest.mark.asyncio decorators, fix dict type annotation, replace internal cache access with call_count - test_evaluator.py: add tests for HaikuSubprocessEvaluator, deduplication, non-dict filtering, and invalid bracket JSON Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Create DW-REQ-012-tool-requirements.md with 12 sub-requirements covering policy format, discovery, inheritance, matching, evaluation, check flow, appeal, caching, hook, sidecar, multi-instance, and startup - Add PLUG-REQ-001.15 for the PreToolUse hook registration - Add requirement ID references to all test module docstrings - Add THIS TEST VALIDATES traceability comments to critical tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- DW-REQ-012.5.3: make SHOULD violation criterion concrete and testable - PLUG-REQ-001: fix section ordering (001.14 before 001.15) - test_engine.py: use two-level REQ ID format (DW-REQ-012.6 not 012.6.3) - test_hook.py: use two-level REQ ID format, fix traceability comment placement - test_evaluator.py: move tests to correct class, remove redundant decorators Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_tool_requirements_hook.py: move import to module level (DRY) - test_evaluator.py: add missing blank line between classes (E302) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nhorton and others added 7 commits April 15, 2026 12:30

fix: address round-4 review findings

012f0af

- test_tool_requirements_hook.py: move import to module level (DRY) - test_evaluator.py: add missing blank line between classes (E302) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add tool requirements policy enforcement system#383

feat: add tool requirements policy enforcement system#383
nhorton wants to merge 7 commits intomainfrom
feat/tool-requirements

nhorton commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nhorton commented Apr 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant