Skip to content

fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation#878

Open
johntmyers wants to merge 1 commit intomainfrom
fix/l7-path-canonicalization
Open

fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation#878
johntmyers wants to merge 1 commit intomainfrom
fix/l7-path-canonicalization

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

Summary

  • The L7 REST proxy evaluates OPA path rules against the raw HTTP request-target and forwards the raw header block byte-for-byte to the upstream. The upstream normalizes .., %2e%2e, and // before dispatching, so a compromised agent inside the sandbox can bypass any path-based allow rule — e.g. GET /public/../secret matches /public/** at the proxy but the upstream serves /secret.
  • The inference-routing detection (normalize_inference_path) had the same normalization gap: it only stripped scheme+authority and preserved dot-segments verbatim, so the same class of payload could also dodge inference routing.
  • This PR introduces a single URL-path canonicalizer used at every L7 enforcement site. The canonical path is both the input to OPA and the bytes written onto the wire — closing the parser-differential between the policy engine and the upstream.

Related Issue

OS-99

Changes

  • crates/openshell-sandbox/src/l7/path.rs (new)
    • canonicalize_request_target(target, opts) -> Result<(CanonicalPath, Option<String> /* raw query */), CanonicalizeError>.
    • Percent-decodes (uppercase hex on re-emit), resolves . / .. per RFC 3986 5.2.4 (rejects underflow rather than silently clamping to /), collapses doubled slashes, strips trailing ;params, handles origin-form and absolute-form targets, enforces a 4 KiB length bound, rejects fragments / raw non-ASCII / control bytes / encoded slashes (unless explicitly opted in per-endpoint via allow_encoded_slash).
    • Returns a rewritten flag so callers know whether the outbound request line needs to be rebuilt.
  • crates/openshell-sandbox/src/l7/rest.rs
    • parse_http_request canonicalizes the request-target before returning the L7Request. If the canonical form differs from the raw input, the request line in raw_header is rebuilt via rewrite_request_line_target so the upstream dispatches on the same bytes the policy evaluated.
    • parse_query_params is now pub(crate) so proxy.rs can reuse it.
  • crates/openshell-sandbox/src/proxy.rs
    • The forward (plain HTTP proxy) L7 evaluation site canonicalizes the request-target. Non-canonical input is denied with 400 Bad Request and an OCSF NetworkActivity Fail event instead of being silently passed through.
    • normalize_inference_path is now a thin wrapper over the canonicalizer (falls back to the raw path on canonicalization error so the normal forward path can reject properly).
  • crates/openshell-sandbox/data/sandbox-policy.rego
    • Comment documenting the invariant: input.request.path is pre-canonicalized, so rules must not attempt defensive matching against .. or %2e%2e.
  • architecture/sandbox.md
    • Adds the new l7/path.rs module to the component table.

Testing

  • cargo test -p openshell-sandbox --lib — 513 tests pass, including 24 new tests in l7::path::tests (dot segments, percent-encoded dot segments, double-slash collapse, encoded-slash reject/opt-in, null/control byte rejection, fragment rejection, absolute-form stripping, legitimate %20 / %40 / %C3%A9 round-tripping, mixed-case percent normalization, length bound, non-ASCII rejection, query-suffix separation, rewritten-flag semantics).
  • cargo test -p openshell-sandbox — 518 tests across all targets pass.
  • mise run pre-commit — passes (lint, format, license headers, rust tests).
  • Regression tests for the documented bypass payloads:
    • /public/../secret/secret
    • /public/%2e%2e/secret/secret
    • /public/%2E%2E%2FsecretEncodedSlashNotAllowed
    • //public/../secret/secret
    • /public/./../secret/secret
    • /public/%00/secretNullOrControlByte
    • /..TraversalAboveRoot
  • Legitimate traffic continues to pass: /files/hello%20world.txt, /users/caf%C3%A9, /v1/chat/completions?stream=true, paths with : and @ in segments.

Trade-offs

  • %2F inside a segment is rejected by default. Operators who need it for artifact-registry-style APIs can opt in per-endpoint via allow_encoded_slash: true (the plumbing is in place — endpoint config wire-up can land in a follow-up once there's a consumer).
  • Non-canonical requests are always hard-rejected regardless of EnforcementMode::Audit — audit mode is for policy-tuning, not for tolerating protocol violations.
  • %20 and similar legitimate percent-encoded bytes are preserved in the canonical form (only unreserved pchars get decoded), so policy patterns that currently match /files/hello%20world.txt continue to work.

Checklist

  • Conventional Commits format
  • No secrets or credentials in diff
  • Scoped to the issue at hand (no unrelated refactors)
  • Architecture and rego documentation updated

…uation

The L7 REST proxy evaluated OPA path rules against the raw request-target
and forwarded the raw header block byte-for-byte to the upstream. The
upstream normalized `..`, `%2e%2e`, and `//` before dispatching, so a
compromised agent inside the sandbox could bypass any path-based allow
rule by prefixing the blocked path with an allowed one (e.g.
`GET /public/../secret` matches `/public/**` at the proxy but the
upstream serves `/secret`). The inference-routing detection had the same
normalization gap via `normalize_inference_path`, which only stripped
scheme+authority and preserved dot-segments verbatim.

- Add `crates/openshell-sandbox/src/l7/path.rs` with
  `canonicalize_request_target`. Percent-decodes (with uppercase hex
  re-emission), resolves `.` / `..` segments per RFC 3986 5.2.4,
  collapses doubled slashes, strips trailing `;params`, rejects
  fragments / raw non-ASCII / control bytes / encoded slashes (unless
  explicitly enabled per-endpoint), and returns a `rewritten` flag so
  callers know when to rewrite the outbound request line.
- Wire the canonicalizer into `rest.rs::parse_http_request`. The
  canonical path is the `target` on `L7Request` that OPA sees, and
  when the canonical form differs from the raw input the request line
  is rebuilt in `raw_header` so the upstream dispatches on the exact
  bytes the policy evaluated.
- Canonicalize the forward (plain HTTP proxy) path at
  `proxy.rs`'s second L7 evaluation site. A non-canonical request-target
  is rejected with 400 Bad Request and an OCSF `NetworkActivity` Fail
  event rather than silently passed through.
- Replace the old `normalize_inference_path` body with a call to the
  canonicalizer. On canonicalization error the raw path is returned
  (so inference detection simply misses and the normal forward path
  handles and rejects).
- Document the invariant in `sandbox-policy.rego`: `input.request.path`
  is pre-canonicalized so rules must not attempt defensive matching
  against `..` or `%2e%2e`.
- Architecture doc (`architecture/sandbox.md`) lists the new module.
- 24 new unit tests cover dot segments, percent-encoded dot segments,
  double-slash collapse, encoded-slash reject/opt-in, null/control byte
  rejection, legitimate percent-encoded bytes round-tripping, mixed-case
  percent normalization, fragment rejection, absolute-form stripping,
  empty / length-bounded / non-ASCII handling, and regression payloads
  for the specific bypasses above.

Tracks OS-99.
@johntmyers johntmyers requested a review from a team as a code owner April 17, 2026 20:15
@johntmyers johntmyers added topic:l7 Application-layer policy and inspection work topic:security Security issues labels Apr 17, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers johntmyers self-assigned this Apr 17, 2026
@johntmyers johntmyers added the test:e2e Requires end-to-end coverage label Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage topic:l7 Application-layer policy and inspection work topic:security Security issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant