feat(evaluators): add ATR threat detection evaluator by eeee2345 · Pull Request #170 · agentcontrol/agent-control

eeee2345 · 2026-04-09T15:56:28Z

Summary

Add a contrib evaluator using ATR (Agent Threat Rules) for regex-based AI agent threat detection. No API keys required.

Resolves #169

Evaluator details

Name: atr.threat_rules
Auto-discovery: via agent_control.evaluators entry point
Config: ATRConfig with min_severity, categories, block_on_match, on_error
Rules: 20 ATR rules, 306 compiled patterns across 8 categories
Performance: <5ms evaluation, pure regex

Config options

ATRConfig(
    min_severity="medium",       # Filter rules below this severity
    categories=["prompt-injection"],  # Only load specific categories (empty = all)
    block_on_match=True,         # Set matched=True on detection
    on_error="allow",            # "allow" (fail-open) or "deny" (fail-closed)
)

Key design decisions

Multi-match: Returns ALL matching rules, not just first match. Metadata includes findings array + backward-compatible single-match fields.
_coerce_to_string: Scans all priority dict fields (content, input, output, text, message), not just the first.
Error handling: Respects on_error policy — fail-open returns error field, fail-closed returns matched=True.
Follows Cisco evaluator pattern: Same directory structure, pyproject.toml, Makefile, entry points.

Files

evaluators/contrib/atr/
  pyproject.toml              # hatchling build, entry point
  Makefile                    # test/lint/typecheck/build
  README.md                   # Usage docs
  src/.../threat_rules/
    config.py                 # ATRConfig
    evaluator.py              # ATREvaluator (222 lines)
    rules.json                # 20 rules, 306 patterns
  tests/
    test_evaluator.py         # 22 tests

Test plan

Rule loading and compilation
Known-bad inputs: prompt injection, jailbreak, reverse shell, credential exposure, system prompt override
Known-good inputs: normal text, code, URLs
Config: min_severity filtering, category filtering, block_on_match=False
Error handling: None/empty input, on_error deny/allow
Multi-match: content triggering multiple categories returns all findings
Dict input: scans all priority fields
make test / make lint / make typecheck

Add contrib evaluator using ATR (Agent Threat Rules) community rules: - 20 regex rules covering OWASP Agentic Top 10 - Configurable severity threshold and category filtering - Pure regex, no API keys, <5ms evaluation time - Auto-discovered via entry points Source: https://agentthreatrule.org (MIT)

- _match_rules now returns all matching rules, not just first match - Add super().__init__(config) call - _coerce_to_string scans all priority dict fields, not just first - Add multi-match test and dict field scanning test - Backward-compatible: single-match fields still in metadata

Panguard AI added 2 commits April 9, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluators): add ATR threat detection evaluator#170

feat(evaluators): add ATR threat detection evaluator#170
eeee2345 wants to merge 2 commits intoagentcontrol:mainfrom
eeee2345:feat/atr-evaluator

eeee2345 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eeee2345 commented Apr 9, 2026

Summary

Evaluator details

Config options

Key design decisions

Files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant