fix(security): close five prompt-injection defense gaps by jkyberneees · Pull Request #1 · BackendStack21/odek

jkyberneees · 2026-05-31T07:30:04Z

Summary

A code-level re-assessment of odek's documented prompt-injection defenses (per docs/SECURITY.md) surfaced five gaps where the implementation diverged from the threat model. This PR fixes all five and adds regression coverage. Existing defenses (nonce'd wrapper, danger classifier, approver friction, audit divergence) held up and are unchanged.

Findings fixed

#	Severity	Gap	Fix
1	High	Redirect hops not re-classified (SSRF). `SECURITY.md` claimed the browser "re-classifies every redirect hop", but `browser` and `http_batch` used bare `http.Client`s that auto-followed redirects without re-running `ClassifyURL`. A benign-classified URL could `302` to `169.254.169.254` (cloud metadata) / an internal host and be fetched.	Both clients install a `CheckRedirect` that re-classifies every hop through `ClassifyURL` + the danger policy and re-imposes the 10-hop limit. (The skill importer already did this — the browser just hadn't inherited it.)
2	Medium	MCP "tool poisoning". The untrusted wrapper only guarded MCP tool output; the server-controlled tool description flowed into the model's tool catalogue as trusted text.	Descriptions are scanned with `ScanInjection` at registration and withheld (tool stays callable by name, warning logged) if injection patterns are found.
3	Medium	MCP error channel bypassed the wrapper + audit log. `untrustedToolWrapper` returned the error unwrapped, yet the loop surfaces `err.Error()` to the model.	Error messages are now wrapped (and audited) too.
4	Low–Med	`session_search` re-exposed content from arbitrary (possibly tainted) past sessions unwrapped and unaudited — bypassing the memory taint gate.	Registered through the untrusted wrapper, so output is wrapped and the retrieval is recorded in the audit log.
5	Low	`source` attribute breakout. `wrapUntrusted` only escaped `"`; an attacker-influenced source with `>` or a newline could prematurely close the opening tag.	Source sanitised for `"`, `<`, `>`, and newlines.

Tests

New cmd/odek/injection_hardening_test.go covers all five, including hermetic httptest redirect integration tests (allow + deny paths, proving the redirect target is re-classified through the approver and never fetched when denied). untrusted_wrapper_test.go updated: the old test pinned the vulnerable error-passthrough behavior (#3) — it now asserts the secure behavior.

go build ./... ✅
go vet ./... ✅
go test ./... ✅ (all packages)
gofmt clean

Docs

README.md and docs/SECURITY.md updated so the threat model matches the implementation (the redirect claim is now actually true; new defenses documented in the wrapper table and attack-vector matrix).

🤖 Generated with Claude Code

A code-level re-assessment of the documented prompt-injection defenses surfaced five gaps where the implementation diverged from the threat model. This fixes all of them and adds regression coverage. 1. Redirect hops were not re-classified (SSRF). SECURITY.md claimed the browser "re-classifies every redirect hop", but both the browser and http_batch used bare http.Clients that followed redirects without re-running ClassifyURL — a benign-classified URL could 302 to 169.254.169.254 (cloud metadata) and be fetched. Both clients now install a CheckRedirect that re-classifies every hop and re-imposes the 10-hop limit (the skill importer already did this). 2. MCP tool descriptions were unscanned ("tool poisoning"). The untrusted wrapper only guarded MCP tool *output*; the server-controlled description flowed into the model's tool catalogue as trusted text. Descriptions are now scanned with ScanInjection at registration and withheld (tool stays callable by name) if injection patterns are found. 3. MCP error channel bypassed the wrapper + audit log. untrustedToolWrapper returned the error unwrapped, but the loop surfaces err.Error() to the model. Error messages are now wrapped (and audited) too. 4. session_search re-exposed past-session content unwrapped + unaudited, bypassing the memory taint gate. It is now registered through the untrusted wrapper so its output is wrapped and the retrieval is logged. 5. wrapUntrusted's source attribute only escaped `"`. An attacker-influenced source containing `>` or a newline could prematurely close the opening tag. The source is now sanitised for `"`, `<`, `>`, and newlines. Docs (README, SECURITY.md) updated to match. Full suite + go vet green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jkyberneees merged commit b84b2e0 into main May 31, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): close five prompt-injection defense gaps#1

fix(security): close five prompt-injection defense gaps#1
jkyberneees merged 1 commit into
mainfrom
fix/prompt-injection-hardening

jkyberneees commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented May 31, 2026

Summary

Findings fixed

Tests

Docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant