Skip to content

feat(evaluators): add ATR threat detection evaluator#170

Open
eeee2345 wants to merge 2 commits intoagentcontrol:mainfrom
eeee2345:feat/atr-evaluator
Open

feat(evaluators): add ATR threat detection evaluator#170
eeee2345 wants to merge 2 commits intoagentcontrol:mainfrom
eeee2345:feat/atr-evaluator

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented Apr 9, 2026

Summary

Add a contrib evaluator using ATR (Agent Threat Rules) for regex-based AI agent threat detection. No API keys required.

Resolves #169

Evaluator details

  • Name: atr.threat_rules
  • Auto-discovery: via agent_control.evaluators entry point
  • Config: ATRConfig with min_severity, categories, block_on_match, on_error
  • Rules: 20 ATR rules, 306 compiled patterns across 8 categories
  • Performance: <5ms evaluation, pure regex

Config options

ATRConfig(
    min_severity="medium",       # Filter rules below this severity
    categories=["prompt-injection"],  # Only load specific categories (empty = all)
    block_on_match=True,         # Set matched=True on detection
    on_error="allow",            # "allow" (fail-open) or "deny" (fail-closed)
)

Key design decisions

  • Multi-match: Returns ALL matching rules, not just first match. Metadata includes findings array + backward-compatible single-match fields.
  • _coerce_to_string: Scans all priority dict fields (content, input, output, text, message), not just the first.
  • Error handling: Respects on_error policy — fail-open returns error field, fail-closed returns matched=True.
  • Follows Cisco evaluator pattern: Same directory structure, pyproject.toml, Makefile, entry points.

Files

evaluators/contrib/atr/
  pyproject.toml              # hatchling build, entry point
  Makefile                    # test/lint/typecheck/build
  README.md                   # Usage docs
  src/.../threat_rules/
    config.py                 # ATRConfig
    evaluator.py              # ATREvaluator (222 lines)
    rules.json                # 20 rules, 306 patterns
  tests/
    test_evaluator.py         # 22 tests

Test plan

  • Rule loading and compilation
  • Known-bad inputs: prompt injection, jailbreak, reverse shell, credential exposure, system prompt override
  • Known-good inputs: normal text, code, URLs
  • Config: min_severity filtering, category filtering, block_on_match=False
  • Error handling: None/empty input, on_error deny/allow
  • Multi-match: content triggering multiple categories returns all findings
  • Dict input: scans all priority fields
  • make test / make lint / make typecheck

Panguard AI added 2 commits April 9, 2026 22:57
Add contrib evaluator using ATR (Agent Threat Rules) community rules:
- 20 regex rules covering OWASP Agentic Top 10
- Configurable severity threshold and category filtering
- Pure regex, no API keys, <5ms evaluation time
- Auto-discovered via entry points

Source: https://agentthreatrule.org (MIT)
- _match_rules now returns all matching rules, not just first match
- Add super().__init__(config) call
- _coerce_to_string scans all priority dict fields, not just first
- Add multi-match test and dict field scanning test
- Backward-compatible: single-match fields still in metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(evaluators): ATR regex-based threat detection evaluator

1 participant