Skip to content

feat: add tool requirements policy enforcement system#383

Open
nhorton wants to merge 7 commits intomainfrom
feat/tool-requirements
Open

feat: add tool requirements policy enforcement system#383
nhorton wants to merge 7 commits intomainfrom
feat/tool-requirements

Conversation

@nhorton
Copy link
Copy Markdown
Contributor

@nhorton nhorton commented Apr 15, 2026

Summary

  • Adds a new policy enforcement system for AI agent tool calls via .deepwork/tool_requirements/*.yml files with RFC 2119-style requirements
  • PreToolUse hook checks policies via an HTTP sidecar server (spawned alongside the MCP server) using Haiku for semantic evaluation
  • Failed checks can be appealed via a new appeal_tool_requirement MCP tool; approvals are cached with a 1-hour TTL
  • Supports no_exception rules (cannot be appealed), policy inheritance via extends, and parameter-level regex matching
  • Fail-closed design: hook denies all tool calls if the MCP sidecar is unreachable
  • Loop prevention: the appeal tool itself is exempt from policy checking

Test plan

  • 62 new unit tests covering config parsing, discovery, matching, cache, evaluator, engine, and hook
  • Full test suite passes (1338 tests, 0 failures)
  • ruff clean
  • mypy clean
  • Manual test: create a .deepwork/tool_requirements/test.yml, start MCP server, trigger a tool call that violates a policy
  • Verify fail-closed behavior when MCP server is stopped
  • Verify appeal flow caches approval and subsequent tool call succeeds

🤖 Generated with Claude Code

nhorton and others added 7 commits April 15, 2026 12:30
Introduces a PreToolUse hook-based policy system that evaluates tool calls
against RFC 2119-style requirements defined in .deepwork/tool_requirements/*.yml.
Policies are checked via an HTTP sidecar server (spawned alongside the MCP server)
using Haiku for semantic evaluation. Failed checks can be appealed via a new
appeal_tool_requirement MCP tool. Approvals are cached with a 1-hour TTL.

Key features:
- Policy files with tools, match (param regex), requirements, extends (inheritance)
- no_exception rules that cannot be appealed
- Fail-closed: hook denies if MCP sidecar is unreachable
- Loop prevention: appeal tool calls skip the hook
- Multi-instance support via PID-keyed + session-keyed port files
- Evaluator encapsulated behind ABC for future swap to direct API calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- engine.py: rename loop variable `f` to `failure` for clarity
- sidecar.py: move `import asyncio` to module level, fix event loop leak
  with try/finally, fix inaccurate comment, add session_id validation
- evaluator.py: change `continue` to `break` on raw JSON array parse,
  filter non-dict items in _extract_json_array
- discovery.py: fix double-name warning message, remove dead code
- test_engine.py: add type hints to MockEvaluator.evaluate, remove unused imports
- test_tool_requirements_hook.py: remove redundant test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- doc/mcp_interface.md: add appeal_tool_requirement as tool #12, bump count
- doc/architecture.md: add tool_requirements/ package and hook to structure
- CLAUDE.md: add tool_requirements/ and hook to project structure appendix
- src/deepwork/hooks/README.md: add tool_requirements.py to files table
- CHANGELOG.md: add tool requirements feature to Unreleased

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- evaluator.py: fix comment accuracy, extract _filter_dicts to reduce DRY
- discovery.py: fix diamond inheritance by copying visited set per parent
- test_engine.py: remove redundant @pytest.mark.asyncio decorators,
  fix dict type annotation, replace internal cache access with call_count
- test_evaluator.py: add tests for HaikuSubprocessEvaluator, deduplication,
  non-dict filtering, and invalid bracket JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create DW-REQ-012-tool-requirements.md with 12 sub-requirements
  covering policy format, discovery, inheritance, matching, evaluation,
  check flow, appeal, caching, hook, sidecar, multi-instance, and startup
- Add PLUG-REQ-001.15 for the PreToolUse hook registration
- Add requirement ID references to all test module docstrings
- Add THIS TEST VALIDATES traceability comments to critical tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DW-REQ-012.5.3: make SHOULD violation criterion concrete and testable
- PLUG-REQ-001: fix section ordering (001.14 before 001.15)
- test_engine.py: use two-level REQ ID format (DW-REQ-012.6 not 012.6.3)
- test_hook.py: use two-level REQ ID format, fix traceability comment placement
- test_evaluator.py: move tests to correct class, remove redundant decorators

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_tool_requirements_hook.py: move import to module level (DRY)
- test_evaluator.py: add missing blank line between classes (E302)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant