Skip to content

fix(security): close five prompt-injection defense gaps#1

Merged
jkyberneees merged 1 commit into
mainfrom
fix/prompt-injection-hardening
May 31, 2026
Merged

fix(security): close five prompt-injection defense gaps#1
jkyberneees merged 1 commit into
mainfrom
fix/prompt-injection-hardening

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Summary

A code-level re-assessment of odek's documented prompt-injection defenses (per docs/SECURITY.md) surfaced five gaps where the implementation diverged from the threat model. This PR fixes all five and adds regression coverage. Existing defenses (nonce'd wrapper, danger classifier, approver friction, audit divergence) held up and are unchanged.

Findings fixed

# Severity Gap Fix
1 High Redirect hops not re-classified (SSRF). SECURITY.md claimed the browser "re-classifies every redirect hop", but browser and http_batch used bare http.Clients that auto-followed redirects without re-running ClassifyURL. A benign-classified URL could 302 to 169.254.169.254 (cloud metadata) / an internal host and be fetched. Both clients install a CheckRedirect that re-classifies every hop through ClassifyURL + the danger policy and re-imposes the 10-hop limit. (The skill importer already did this — the browser just hadn't inherited it.)
2 Medium MCP "tool poisoning". The untrusted wrapper only guarded MCP tool output; the server-controlled tool description flowed into the model's tool catalogue as trusted text. Descriptions are scanned with ScanInjection at registration and withheld (tool stays callable by name, warning logged) if injection patterns are found.
3 Medium MCP error channel bypassed the wrapper + audit log. untrustedToolWrapper returned the error unwrapped, yet the loop surfaces err.Error() to the model. Error messages are now wrapped (and audited) too.
4 Low–Med session_search re-exposed content from arbitrary (possibly tainted) past sessions unwrapped and unaudited — bypassing the memory taint gate. Registered through the untrusted wrapper, so output is wrapped and the retrieval is recorded in the audit log.
5 Low source attribute breakout. wrapUntrusted only escaped "; an attacker-influenced source with > or a newline could prematurely close the opening tag. Source sanitised for ", <, >, and newlines.

Tests

New cmd/odek/injection_hardening_test.go covers all five, including hermetic httptest redirect integration tests (allow + deny paths, proving the redirect target is re-classified through the approver and never fetched when denied). untrusted_wrapper_test.go updated: the old test pinned the vulnerable error-passthrough behavior (#3) — it now asserts the secure behavior.

  • go build ./...
  • go vet ./...
  • go test ./... ✅ (all packages)
  • gofmt clean

Docs

README.md and docs/SECURITY.md updated so the threat model matches the implementation (the redirect claim is now actually true; new defenses documented in the wrapper table and attack-vector matrix).

🤖 Generated with Claude Code

A code-level re-assessment of the documented prompt-injection defenses
surfaced five gaps where the implementation diverged from the threat
model. This fixes all of them and adds regression coverage.

1. Redirect hops were not re-classified (SSRF). SECURITY.md claimed the
   browser "re-classifies every redirect hop", but both the browser and
   http_batch used bare http.Clients that followed redirects without
   re-running ClassifyURL — a benign-classified URL could 302 to
   169.254.169.254 (cloud metadata) and be fetched. Both clients now
   install a CheckRedirect that re-classifies every hop and re-imposes
   the 10-hop limit (the skill importer already did this).

2. MCP tool descriptions were unscanned ("tool poisoning"). The untrusted
   wrapper only guarded MCP tool *output*; the server-controlled
   description flowed into the model's tool catalogue as trusted text.
   Descriptions are now scanned with ScanInjection at registration and
   withheld (tool stays callable by name) if injection patterns are found.

3. MCP error channel bypassed the wrapper + audit log. untrustedToolWrapper
   returned the error unwrapped, but the loop surfaces err.Error() to the
   model. Error messages are now wrapped (and audited) too.

4. session_search re-exposed past-session content unwrapped + unaudited,
   bypassing the memory taint gate. It is now registered through the
   untrusted wrapper so its output is wrapped and the retrieval is logged.

5. wrapUntrusted's source attribute only escaped `"`. An attacker-influenced
   source containing `>` or a newline could prematurely close the opening
   tag. The source is now sanitised for `"`, `<`, `>`, and newlines.

Docs (README, SECURITY.md) updated to match. Full suite + go vet green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit b84b2e0 into main May 31, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant