fix(security): close five prompt-injection defense gaps#1
Merged
Conversation
A code-level re-assessment of the documented prompt-injection defenses
surfaced five gaps where the implementation diverged from the threat
model. This fixes all of them and adds regression coverage.
1. Redirect hops were not re-classified (SSRF). SECURITY.md claimed the
browser "re-classifies every redirect hop", but both the browser and
http_batch used bare http.Clients that followed redirects without
re-running ClassifyURL — a benign-classified URL could 302 to
169.254.169.254 (cloud metadata) and be fetched. Both clients now
install a CheckRedirect that re-classifies every hop and re-imposes
the 10-hop limit (the skill importer already did this).
2. MCP tool descriptions were unscanned ("tool poisoning"). The untrusted
wrapper only guarded MCP tool *output*; the server-controlled
description flowed into the model's tool catalogue as trusted text.
Descriptions are now scanned with ScanInjection at registration and
withheld (tool stays callable by name) if injection patterns are found.
3. MCP error channel bypassed the wrapper + audit log. untrustedToolWrapper
returned the error unwrapped, but the loop surfaces err.Error() to the
model. Error messages are now wrapped (and audited) too.
4. session_search re-exposed past-session content unwrapped + unaudited,
bypassing the memory taint gate. It is now registered through the
untrusted wrapper so its output is wrapped and the retrieval is logged.
5. wrapUntrusted's source attribute only escaped `"`. An attacker-influenced
source containing `>` or a newline could prematurely close the opening
tag. The source is now sanitised for `"`, `<`, `>`, and newlines.
Docs (README, SECURITY.md) updated to match. Full suite + go vet green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A code-level re-assessment of odek's documented prompt-injection defenses (per
docs/SECURITY.md) surfaced five gaps where the implementation diverged from the threat model. This PR fixes all five and adds regression coverage. Existing defenses (nonce'd wrapper, danger classifier, approver friction, audit divergence) held up and are unchanged.Findings fixed
SECURITY.mdclaimed the browser "re-classifies every redirect hop", butbrowserandhttp_batchused barehttp.Clients that auto-followed redirects without re-runningClassifyURL. A benign-classified URL could302to169.254.169.254(cloud metadata) / an internal host and be fetched.CheckRedirectthat re-classifies every hop throughClassifyURL+ the danger policy and re-imposes the 10-hop limit. (The skill importer already did this — the browser just hadn't inherited it.)ScanInjectionat registration and withheld (tool stays callable by name, warning logged) if injection patterns are found.untrustedToolWrapperreturned the error unwrapped, yet the loop surfaceserr.Error()to the model.session_searchre-exposed content from arbitrary (possibly tainted) past sessions unwrapped and unaudited — bypassing the memory taint gate.sourceattribute breakout.wrapUntrustedonly escaped"; an attacker-influenced source with>or a newline could prematurely close the opening tag.",<,>, and newlines.Tests
New
cmd/odek/injection_hardening_test.gocovers all five, including hermetichttptestredirect integration tests (allow + deny paths, proving the redirect target is re-classified through the approver and never fetched when denied).untrusted_wrapper_test.goupdated: the old test pinned the vulnerable error-passthrough behavior (#3) — it now asserts the secure behavior.go build ./...✅go vet ./...✅go test ./...✅ (all packages)gofmtcleanDocs
README.mdanddocs/SECURITY.mdupdated so the threat model matches the implementation (the redirect claim is now actually true; new defenses documented in the wrapper table and attack-vector matrix).🤖 Generated with Claude Code