Skip to content

feat(reactotron-mcp): expand redaction defaults and add form-urlencoded body support#1608

Merged
joshuayoes merged 1 commit into
feat/mcp-redactionfrom
feat/mcp-redaction-improvements
Apr 24, 2026
Merged

feat(reactotron-mcp): expand redaction defaults and add form-urlencoded body support#1608
joshuayoes merged 1 commit into
feat/mcp-redactionfrom
feat/mcp-redaction-improvements

Conversation

@joshuayoes
Copy link
Copy Markdown
Contributor

@joshuayoes joshuayoes commented Apr 24, 2026

Summary

Stacks on top of #1607. Expands the MCP redactor's default denylists to match the cross-tool industry consensus and adds per-field redaction for application/x-www-form-urlencoded request bodies. Research comparing how other developer tools handle this is below — the short version: the closest analogs (Proxyman MCP, Sentry MCP, GitHub MCP, Postman) all redact at the server boundary by default, and their built-in denylists are broader than what #1607 currently ships.

Changes

Default rules — additions

Header names

  • CSRF / XSRF variants: x-csrf-token, x-xsrf-token, csrf-token
  • IP-forwarding PII headers: x-forwarded-for, x-real-ip

Sensitive keys

  • Password aliases: passwd, pwd
  • Generic auth-token names: token, bearer, jwt, id_token, idtoken
  • Session & CSRF: session, sessionid, session_id, csrf, xsrf, csrf_token, xsrf_token
  • OAuth: client_secret, clientsecret, x-api-key

Value patterns

  • Anthropic API keys (sk-ant-…)
  • AWS access key IDs (AKIA…)
  • Google API keys (AIza… + 35 chars)
  • Stripe secret/publishable/restricted keys, live + test ((?:sk|pk|rk)_(?:test|live)_…)
  • PEM-encoded private key blocks (RSA, EC, DSA, OPENSSH, PGP, generic)
  • GitHub PAT regex broadened from ghp_ only to gh[pousr]_ — covers classic, server-to-server, OAuth, user-to-server, and refresh tokens

Form-urlencoded body redaction

A new code path catches strings shaped like k=v&k=v with no URL prefix (typical application/x-www-form-urlencoded POST bodies). If any key matches sensitiveKeys, just that value is redacted — the same semantics already used for URL query params. A strict full-match regex prevents false positives on prose that happens to contain =.

Tests

105 tests passing. New coverage:

  • Each category of new default rule
  • Each new value pattern, with test literals constructed at runtime so GitHub secret-scanning doesn't flag the test file
  • Form-urlencoded body redaction, including negative tests for casual strings and URL-containing strings

Docs

docs/mcp.md updated to reflect the expanded default list and call out form-body handling.


Research — how other tools handle this

We spawned parallel research on how similar developer tools handle sensitive-data redaction. Full notes kept in the PR discussion; the convergent findings:

1. Redact at the server/MCP boundary — unanimous

Every closest analog does it at the MCP serialization layer, not in the UI and not in the model:

  • Proxyman MCP"Sensitive data (auth tokens, passwords, API keys) is automatically redacted in responses" (docs)
  • Sentry MCP — inherits Sentry's server-side scrubber
  • GitHub MCP — scans inputs for secrets and blocks by default (changelog)
  • Postman Repro — case-insensitive default-key redaction
  • mitmproxy FilteredDumper pattern — redact at display/egress, not on the wire

OWASP MCP Top 10 — MCP01:2025 explicitly mandates: "redact or sanitize inputs and outputs before logging… redact or mask secrets before writing to logs or telemetry." (link)

2. No sensitive / secretHint annotation exists in the MCP spec today

The 2025-03-26 spec adds readOnlyHint, destructiveHint, idempotentHint, openWorldHint — but the maintainers are explicit: "clients MUST NOT rely solely on these for security decisions." (MCP blog) Treat server-side redaction as the hard boundary; don't wait for an annotation.

3. The de-facto default denylist

Union across Sentry, Bugsnag, google/har-sanitizer, Postman, Chrome DevTools sanitized HAR, Presidio:

  • Headers: Authorization, Cookie, Set-Cookie, Proxy-Authorization, X-Api-Key, X-CSRF-Token, X-XSRF-Token, X-Forwarded-For
  • Keys: password/passwd/pwd, secret, token, bearer, jwt, auth, authorization, api_key/apikey, credentials, session/sessionid, csrf/xsrf, access_token, refresh_token, id_token, client_secret, private_key
  • Value patterns: AWS (AKIA…), Google (AIza…), JWT (eyJ…), Stripe, GitHub PATs (all prefixes), PEM private key blocks, Anthropic (sk-ant-…)

This PR brings our defaults in line with that union.

4. Tool-by-tool highlights

Tool Redaction approach What we took / avoided
Charles Proxy None built-in; user-written Rewrite rules only Avoid its "bring your own regex" UX — ship opinionated defaults
Wireshark editcap + third-party TraceWrangler; fail-closed pattern Noted strictMode allowlist as future work
Postman "Secret" variable type masks UI only; still exfiltrated in analytics URLs — cautionary tale Redact the fully-rendered payload at MCP boundary, not at display
mitmproxy / Proxyman modify_headers, Python addons; Proxyman MCP auto-redacts but rules are opaque/non-tunable Keep user-tunable config; don't ship an opaque rule set
Chrome DevTools Export HAR (sanitized) strips Authorization, Cookie, Set-Cookie only (Chrome 130, Oct 2024) That's the floor. We already go beyond.
google/har-sanitizer Public wordliststate, token, access_token, client_secret, SAMLRequest, etc. Directly informed our expanded default key list
Cloudflare HAR sanitizer Conditional, not denylist — strips JWT signature but keeps claims for debugging Filed as a future enhancement (partial/format-preserving redaction)
Sentry / Bugsnag / Datadog / LogRocket Opinionated server-side defaults + user-extendable via beforeSend-style hook; Datadog offers partial redaction & Luhn-validated card detection Union of their default lists → our new defaults. Partial redaction & Luhn are follow-ups.

Key canonical incident

Okta support breach (Oct 2023) — attacker stole HAR files from 134 customer support tickets; the HARs contained live session tokens that were used to hijack sessions at BeyondTrust, Cloudflare, and 1Password. The PR's default-on posture is the right response to this class of leak.


What is intentionally NOT in this PR

Tracked as follow-ups so the review stays focused:

  • Substring matching on keys. Sentry JS and Bugsnag match substrings; that catches sessionToken/userPassword automatically but false-positives on author/authored_by when auth is in the list. Would need a separate denylist/pattern split.
  • Typed redaction markers ([REDACTED:jwt]) and a _redacted summary sibling field. Useful for LLM reasoning and defensive-sandwich logging but changes the public output shape.
  • Luhn-validated credit-card detection. A bare 13–19 digit regex produces too many false positives on random IDs and unix timestamps; needs Luhn to be safe.
  • Cookie-value parsing within the Cookie header. Currently the whole header is blunt-redacted. Cloudflare's per-cookie approach (keep names, redact values) would preserve more debug info.
  • Partial / format-preserving masking (keep last 4 of card, keep JWT claims but strip signature) — the strongest idea from Cloudflare/Datadog, worth a dedicated PR.
  • strictMode allowlist (à la TraceWrangler's "drop unknown layers" / mitmproxy's FilteredDumper) — only forward known-safe headers, redact the rest.

Test plan

  • yarn test in lib/reactotron-mcp — 105 tests pass
  • yarn typecheck clean
  • yarn build succeeds
  • Reviewer sanity-check: no new default key is an obvious false-positive trigger for any team's app-specific field names
  • Reviewer sanity-check: form-encoded regex doesn't false-positive on real-world payloads in your apps

🤖 Generated with Claude Code

…ed body support

Bring the MCP redactor closer to the industry consensus denylist used by
Sentry, Bugsnag, Postman, Cloudflare, and google/har-sanitizer. Research
comparing Charles, Wireshark, Postman, mitmproxy, Proxyman, Chrome DevTools,
Sentry, and Datadog is summarized in the PR description.

Default rules:
- Add CSRF/XSRF and IP-PII header names (x-csrf-token, x-xsrf-token,
  csrf-token, x-forwarded-for, x-real-ip).
- Add common auth/session key variants (token, bearer, jwt, id_token,
  session, sessionid, csrf, xsrf, passwd, pwd, client_secret).
- Add value patterns for Anthropic keys, AWS access key IDs, Google API
  keys, Stripe live/test/restricted keys, and PEM private key blocks.
- Broaden GitHub PAT regex from ghp_ only to gh[pousr]_ (classic, server,
  OAuth, user-to-server, refresh).

Form-urlencoded bodies:
- Strings shaped like `k=v&k=v` (no URL prefix) now get the same per-field
  redaction as URL query parameters. A strict full-match regex prevents
  false positives on prose containing `=`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joshuayoes joshuayoes merged commit a47a71f into feat/mcp-redaction Apr 24, 2026
3 checks passed
@joshuayoes joshuayoes deleted the feat/mcp-redaction-improvements branch April 24, 2026 18:46
joshuayoes added a commit that referenced this pull request May 4, 2026
…ed body support (#1608)

## Summary

Stacks on top of #1607. Expands the MCP redactor's default denylists to
match the cross-tool industry consensus and adds per-field redaction for
`application/x-www-form-urlencoded` request bodies. Research comparing
how other developer tools handle this is below — the short version: the
closest analogs (Proxyman MCP, Sentry MCP, GitHub MCP, Postman) all
redact at the server boundary by default, and their built-in denylists
are broader than what #1607 currently ships.

## Changes

### Default rules — additions

**Header names**
- CSRF / XSRF variants: `x-csrf-token`, `x-xsrf-token`, `csrf-token`
- IP-forwarding PII headers: `x-forwarded-for`, `x-real-ip`

**Sensitive keys**
- Password aliases: `passwd`, `pwd`
- Generic auth-token names: `token`, `bearer`, `jwt`, `id_token`,
`idtoken`
- Session & CSRF: `session`, `sessionid`, `session_id`, `csrf`, `xsrf`,
`csrf_token`, `xsrf_token`
- OAuth: `client_secret`, `clientsecret`, `x-api-key`

**Value patterns**
- Anthropic API keys (`sk-ant-…`)
- AWS access key IDs (`AKIA…`)
- Google API keys (`AIza…` + 35 chars)
- Stripe secret/publishable/restricted keys, live + test
(`(?:sk|pk|rk)_(?:test|live)_…`)
- PEM-encoded private key blocks (RSA, EC, DSA, OPENSSH, PGP, generic)
- GitHub PAT regex broadened from `ghp_` only to `gh[pousr]_` — covers
classic, server-to-server, OAuth, user-to-server, and refresh tokens

### Form-urlencoded body redaction

A new code path catches strings shaped like `k=v&k=v` with no URL prefix
(typical `application/x-www-form-urlencoded` POST bodies). If any key
matches `sensitiveKeys`, just that value is redacted — the same
semantics already used for URL query params. A strict full-match regex
prevents false positives on prose that happens to contain `=`.

### Tests

105 tests passing. New coverage:
- Each category of new default rule
- Each new value pattern, with test literals constructed at runtime so
GitHub secret-scanning doesn't flag the test file
- Form-urlencoded body redaction, including negative tests for casual
strings and URL-containing strings

### Docs

`docs/mcp.md` updated to reflect the expanded default list and call out
form-body handling.

---

## Research — how other tools handle this

We spawned parallel research on how similar developer tools handle
sensitive-data redaction. Full notes kept in the PR discussion; the
convergent findings:

### 1. Redact at the server/MCP boundary — unanimous
Every closest analog does it at the MCP serialization layer, not in the
UI and not in the model:
- **Proxyman MCP** — *"Sensitive data (auth tokens, passwords, API keys)
is automatically redacted in responses"*
([docs](https://docs.proxyman.com/mcp))
- **Sentry MCP** — inherits Sentry's server-side scrubber
- **GitHub MCP** — scans inputs for secrets and blocks by default
([changelog](https://github.blog/changelog/2025-08-13-github-mcp-server-secret-scanning-push-protection-and-more/))
- **Postman Repro** — case-insensitive default-key redaction
- **mitmproxy `FilteredDumper` pattern** — redact at display/egress, not
on the wire

**OWASP MCP Top 10 — MCP01:2025** explicitly mandates: *"redact or
sanitize inputs and outputs before logging… redact or mask secrets
before writing to logs or telemetry."*
([link](https://owasp.org/www-project-mcp-top-10/2025/MCP01-2025-Token-Mismanagement-and-Secret-Exposure))

### 2. No `sensitive` / `secretHint` annotation exists in the MCP spec
today
The 2025-03-26 spec adds `readOnlyHint`, `destructiveHint`,
`idempotentHint`, `openWorldHint` — but the maintainers are explicit:
*"clients MUST NOT rely solely on these for security decisions."* ([MCP
blog](https://blog.modelcontextprotocol.io/posts/2026-03-16-tool-annotations/))
Treat server-side redaction as the hard boundary; don't wait for an
annotation.

### 3. The de-facto default denylist
Union across **Sentry**, **Bugsnag**, **google/har-sanitizer**,
**Postman**, **Chrome DevTools sanitized HAR**, **Presidio**:

- Headers: `Authorization`, `Cookie`, `Set-Cookie`,
`Proxy-Authorization`, `X-Api-Key`, `X-CSRF-Token`, `X-XSRF-Token`,
`X-Forwarded-For`
- Keys: `password`/`passwd`/`pwd`, `secret`, `token`, `bearer`, `jwt`,
`auth`, `authorization`, `api_key`/`apikey`, `credentials`,
`session`/`sessionid`, `csrf`/`xsrf`, `access_token`, `refresh_token`,
`id_token`, `client_secret`, `private_key`
- Value patterns: AWS (`AKIA…`), Google (`AIza…`), JWT (`eyJ…`), Stripe,
GitHub PATs (all prefixes), PEM private key blocks, Anthropic
(`sk-ant-…`)

This PR brings our defaults in line with that union.

### 4. Tool-by-tool highlights

| Tool | Redaction approach | What we took / avoided |
|---|---|---|
| **Charles Proxy** | None built-in; user-written Rewrite rules only |
Avoid its "bring your own regex" UX — ship opinionated defaults |
| **Wireshark** | `editcap` + third-party TraceWrangler; fail-closed
pattern | Noted `strictMode` allowlist as future work |
| **Postman** | "Secret" variable type masks UI only; still exfiltrated
in analytics URLs — cautionary tale | Redact the fully-rendered payload
at MCP boundary, not at display |
| **mitmproxy / Proxyman** | `modify_headers`, Python addons; Proxyman
MCP auto-redacts but rules are opaque/non-tunable | Keep user-tunable
config; don't ship an opaque rule set |
| **Chrome DevTools** | `Export HAR (sanitized)` strips `Authorization`,
`Cookie`, `Set-Cookie` only (Chrome 130, Oct 2024) | That's the floor.
We already go beyond. |
| **google/har-sanitizer** | Public
[wordlist](https://github.com/google/har-sanitizer/blob/master/harsanitizer/static/wordlist.json)
— `state`, `token`, `access_token`, `client_secret`, `SAMLRequest`, etc.
| Directly informed our expanded default key list |
| **Cloudflare HAR sanitizer** | Conditional, not denylist — strips JWT
signature but keeps claims for debugging | Filed as a future enhancement
(partial/format-preserving redaction) |
| **Sentry / Bugsnag / Datadog / LogRocket** | Opinionated server-side
defaults + user-extendable via `beforeSend`-style hook; Datadog offers
partial redaction & Luhn-validated card detection | Union of their
default lists → our new defaults. Partial redaction & Luhn are
follow-ups. |

### Key canonical incident
**Okta support breach (Oct 2023)** — attacker stole HAR files from 134
customer support tickets; the HARs contained live session tokens that
were used to hijack sessions at BeyondTrust, Cloudflare, and 1Password.
The PR's default-on posture is the right response to this class of leak.

---

## What is intentionally NOT in this PR

Tracked as follow-ups so the review stays focused:

- **Substring matching on keys.** Sentry JS and Bugsnag match
substrings; that catches `sessionToken`/`userPassword` automatically but
false-positives on `author`/`authored_by` when `auth` is in the list.
Would need a separate denylist/pattern split.
- **Typed redaction markers** (`[REDACTED:jwt]`) and a `_redacted`
summary sibling field. Useful for LLM reasoning and defensive-sandwich
logging but changes the public output shape.
- **Luhn-validated credit-card detection.** A bare 13–19 digit regex
produces too many false positives on random IDs and unix timestamps;
needs Luhn to be safe.
- **Cookie-value parsing within the `Cookie` header.** Currently the
whole header is blunt-redacted. Cloudflare's per-cookie approach (keep
names, redact values) would preserve more debug info.
- **Partial / format-preserving masking** (keep last 4 of card, keep JWT
claims but strip signature) — the strongest idea from
Cloudflare/Datadog, worth a dedicated PR.
- **`strictMode` allowlist** (à la TraceWrangler's "drop unknown layers"
/ mitmproxy's `FilteredDumper`) — only forward known-safe headers,
redact the rest.

## Test plan

- [x] `yarn test` in `lib/reactotron-mcp` — 105 tests pass
- [x] `yarn typecheck` clean
- [x] `yarn build` succeeds
- [ ] Reviewer sanity-check: no new default key is an obvious
false-positive trigger for any team's app-specific field names
- [ ] Reviewer sanity-check: form-encoded regex doesn't false-positive
on real-world payloads in your apps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant