Skip to content

feat(transport): pluggable HttpTransport abstraction#135

Merged
chaliy merged 1 commit into
mainfrom
feat/http-transport
Jun 12, 2026
Merged

feat(transport): pluggable HttpTransport abstraction#135
chaliy merged 1 commit into
mainfrom
feat/http-transport

Conversation

@chaliy

@chaliy chaliy commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

Pluggable HttpTransport trait so hosts can route all fetchkit HTTP through their own egress boundary. fetchkit keeps URL validation, DNS policy (resolve-then-check producing pinned_addrs), manual per-hop redirect following, body caps/timeouts, bot-auth signing, and content conversion — only the socket-level send moves behind the trait.

Why

Host applications (Everruns) need a single host-owned egress boundary for outbound traffic (policy, audit, signing). Everruns' egress path currently bypasses fetchkit's clients entirely and loses all specialized fetchers (everruns/everruns#2118). Transport injection restores them: fetchkit stays the request/response adapter, host owns transport.

How

  • transport.rs: HttpTransport trait, TransportRequest (method/url/headers/timeout/pinned_addrs/respect_proxy_env), streaming TransportResponse, TransportError, default ReqwestTransport preserving exact historical behavior (per-request client, no redirects, proxy env suppressed, resolve_to_addrs pinning).
  • FetchOptions.transport: Option<Arc<dyn HttpTransport>> (None => ReqwestTransport); ToolBuilder::transport() for the Tool surface — all execution paths honor it.
  • All 12 fetchers now do HTTP exclusively via the transport. Single hop per call; fetchkit validates every redirect hop.

Security

  • Strictly single-hop transport: never follows redirects, never resolves DNS policy — fetchkit owns both (TM-SSRF-001/005/010, TM-NET-003/004). Custom transports MUST connect only to pinned_addrs when non-empty.
  • Strengthened: youtube/hackernews API hosts now DNS-pinned (previously unpinned); wikipedia/package_registry/arxiv/hackernews redirects now per-hop validated (previously reqwest-internal Policy::limited(3)); fix(fetchers): enforce policy for GitHub API subrequests #131 subrequest policy enforcement preserved.
  • Specialized-fetcher JSON reads now capped at max_body_size (previously unbounded resp.json()).
  • THREAT comments updated; specs/fetchers.md + specs/threat-model.md updated.

Validation

  • fmt + clippy (--all-targets --all-features -D warnings) clean
  • cargo test default + bot-auth features: all green (298 unit + suites; 8 transport tests incl. mock-transport Tool/DefaultFetcher/redirect/body-cap, pinned_addrs population)
  • --all-features live-network tests: 4 pre-existing GitHub rate-limit failures, unrelated (hit real hosts via default transport)
  • CLI smoke: http://127.0.0.1 blocked; proxy env ignored by default

Introduce a pluggable HTTP transport so a host application can route
fetchkit's outbound HTTP through its own egress boundary while fetchkit
retains all content logic and security policy.

- New `transport` module: `HttpTransport` trait, `TransportRequest`
  (with policy-pinned `pinned_addrs`), streaming `TransportResponse`,
  `TransportError`, and default `ReqwestTransport`.
- `FetchOptions` gains `transport: Option<Arc<dyn HttpTransport>>`
  (None => ReqwestTransport); Clone via Arc, manual Debug.
- `ToolBuilder::transport(Arc<dyn HttpTransport>)` threads the transport
  into the Tool's FetchOptions, so hosts that consume fetchkit through
  the Tool surface (description/schema/llmtxt + execute /
  execute_with_saver) keep that surface while owning egress. Tool and
  ToolBuilder switch to manual Debug like FetchOptions.
- All fetchers now perform HTTP exclusively through the transport. DNS
  policy resolution still runs in fetchkit and produces the pinned addrs;
  manual per-hop redirect following, bot-auth signing, and body/timeout
  caps stay in fetchkit. Only the socket send is delegated.
- `read_body_with_timeout` reads the transport's boxed byte stream so
  body-size caps and partial-content-on-timeout keep working mid-stream.
- Add `DnsPolicy::pinned_addrs` helper.
- Tests: in-memory mock transport proves GET/HEAD/redirect/body-cap via a
  custom transport plus the wikipedia fetcher; pinned_addrs population
  test; Tool::execute and Tool::execute_with_saver run entirely through
  an injected mock transport with no network.

Default behavior is unchanged (ReqwestTransport).
Copilot AI review requested due to automatic review settings June 12, 2026 01:02

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a pluggable HttpTransport abstraction so host applications can provide their own egress path for all outbound HTTP, while fetchkit continues to own URL validation, DNS policy (pinning), redirect validation, bot-auth signing, and body/timeout caps.

Changes:

  • Added HttpTransport trait + default ReqwestTransport, and plumbed transport selection through FetchOptions and the Tool surface.
  • Refactored all fetchers to perform HTTP exclusively via the transport, with manual per-hop redirect validation and DNS pinning where applicable.
  • Added integration tests for transport injection and updated threat model/spec docs + changelog.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
specs/threat-model.md Documents the security posture/ownership boundaries for the new transport layer.
specs/fetchers.md Adds spec-level documentation on transport injection and invariants (single-hop, pinned addrs).
crates/fetchkit/tests/transport.rs New integration tests validating transport injection, redirect behavior, and body caps.
crates/fetchkit/src/transport.rs New transport trait/types + default reqwest-backed transport implementation.
crates/fetchkit/src/tool.rs Adds ToolBuilder::transport() and carries transport through all Tool execution paths.
crates/fetchkit/src/lib.rs Exposes the transport module and re-exports transport types in the public API.
crates/fetchkit/src/fetchers/youtube.rs Routes YouTube fetcher HTTP via transport; adds DNS pinning for YouTube host.
crates/fetchkit/src/fetchers/wikipedia.rs Replaces reqwest redirects with manual per-hop redirects via transport.
crates/fetchkit/src/fetchers/twitter.rs Refactors Twitter fetcher to use transport + shared headers helper.
crates/fetchkit/src/fetchers/stackoverflow.rs Refactors StackOverflow API calls to use transport + capped JSON reads.
crates/fetchkit/src/fetchers/rss_feed.rs Updates to transport response types + header lookup helper.
crates/fetchkit/src/fetchers/package_registry.rs Uses transport with per-hop redirect validation for registry APIs.
crates/fetchkit/src/fetchers/hackernews.rs Uses transport with per-hop redirect validation + capped JSON parsing.
crates/fetchkit/src/fetchers/github_repo.rs Uses transport for GitHub API subrequests with DNS pinning and body caps.
crates/fetchkit/src/fetchers/github_issue.rs Uses transport for GitHub issue/PR API calls with capped JSON parsing.
crates/fetchkit/src/fetchers/github_code.rs Uses transport for GitHub contents API requests with DNS pinning.
crates/fetchkit/src/fetchers/docs_site.rs Updates to transport responses; caps llms.txt reads with streaming timeout/size.
crates/fetchkit/src/fetchers/default.rs Core refactor: manual redirect loop now executes via transport; streaming caps applied to transport bodies.
crates/fetchkit/src/fetchers/arxiv.rs Uses transport with manual per-hop redirect validation for arXiv API calls.
crates/fetchkit/src/dns.rs Adds DnsPolicy::pinned_addrs() helper to produce transport pin sets.
crates/fetchkit/src/client.rs Adds FetchOptions.transport + internal resolver to choose default transport.
CHANGELOG.md Documents the new transport abstraction and its security guarantees.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +170 to +172
if let Some(timeout) = req.timeout {
builder = builder.connect_timeout(timeout).timeout(timeout);
}
request = request.header(name, value);
}

let response = request.send().await.map_err(TransportError::Reqwest)?;
@chaliy chaliy merged commit 3717d95 into main Jun 12, 2026
12 checks passed
@chaliy chaliy deleted the feat/http-transport branch June 12, 2026 01:38
chaliy added a commit that referenced this pull request Jun 12, 2026
## Release v0.4.0

Breaking minor release: pluggable HTTP transport abstraction (#135).

### Why 0.4.0 (not 0.3.1)
`FetchOptions` gained a public `transport` field and it is an exhaustive
pub struct — formally breaking for exhaustive struct-literal
construction. Cargo auto-upgrades within `^0.3`, so the breaking change
must move out of the 0.3.x range.

### Changelog excerpt

**Highlights**
- Pluggable HTTP transport: hosts route fetchkit HTTP through their own
egress boundary (`FetchOptions::transport` / `ToolBuilder::transport`);
fetchkit keeps URL validation, DNS pinning, redirect following, signing,
body caps. Default unchanged (`ReqwestTransport`).
- Hardening: YouTube/HackerNews API hosts DNS-pinned;
Wikipedia/registry/arXiv/HackerNews redirects per-hop validated;
specialized-fetcher JSON reads capped at `max_body_size`.
- Local file saver symlink handling hardened.

**Breaking Changes**
- `FetchOptions` new public `transport` field — add it or use
`..Default::default()`.
- `FetchOptions`/`Tool`/`ToolBuilder` Debug impls are now manual.

### Verification
- `cargo fmt --all -- --check` ✓
- `cargo clippy --workspace --all-targets -- -D warnings` ✓
- `cargo test --workspace` ✓ (all suites)
- `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` ✓
- `cargo build --workspace --exclude fetchkit-python --release` ✓

After merge, release.yml creates the GitHub Release `v0.4.0` and
dispatches publish.yml (fetchkit → fetchkit-cli, in order).
chaliy added a commit to everruns/everruns that referenced this pull request Jun 12, 2026
)

## What
Replace the egress path's hand-rolled fetch pipeline with fetchkit 0.4's
transport injection: `EgressHttpTransport` implements
`fetchkit::HttpTransport` over `EgressService`, injected per execution
via `ToolBuilder::transport` when `ToolContext.egress_service` is
present. Bumps fetchkit `=0.2.0` → `=0.4.0`. Net **−721 lines**.

## Why
#2118 migrated web_fetch transport behind the egress boundary but had to
bypass fetchkit's clients, reimplementing ~990 lines of
redirect/binary/truncation logic and losing the specialized fetchers —
its docs noted "Restoring them requires upstream transport injection in
fetchkit." That upstream API shipped in fetchkit 0.4.0
(everruns/fetchkit#135). This PR adopts it: fetchkit keeps its whole
pipeline (specialized fetchers GitHub/Wikipedia/arXiv/…, DNS policy,
per-hop redirect validation, body caps, bot-auth signing) while every
HTTP hop still crosses the egress boundary.

## How
- `web_fetch_egress.rs`: 990-line reimplementation → ~115-line
`EgressHttpTransport` adapter (+ tests). Maps `TransportRequest` →
`EgressRequest` (kind=Capability, signing=PlatformDefault, per-hop
`network_access`, fetchkit's resolved `pinned_addrs` →
`EgressRequest.pinned_addrs` for TM-TOOL-018 pinning) and streams the
response body back.
- `web_fetch.rs`: stores the `ToolBuilder` template; per execution
builds an egress-routed Tool when the context has an egress service,
direct Tool otherwise. Both paths now share one execution flow (incl.
`save_to_file`).
- System-allowlist + ACL pre-flight checks on the initial URL restored
on **both** paths for distinct user-facing errors (`Endpoint blocked by
system policy…` / `URL blocked by network access policy…`); the egress
boundary remains the final enforcement point on every hop.
- Fixes a #2118 message regression: egress denials no longer use the
pre-#2113 "deployment's system allowlist" wording (now: `Outbound
request blocked by network policy: {url}`).
- Behavior changes vs #2118's egress path: specialized fetchers
restored; bot-auth signing restored (fetchkit signs each hop before it
crosses the boundary — #2118 had silently dropped signing on the egress
path); SSRF resolve-then-check now via fetchkit `DnsPolicy` (same
protections as `validate_url_dns_pinned`:
loopback/RFC1918/link-local/metadata blocked, resolve-then-check with
pinning).
- Specs updated: `specs/fetchkit.md`, `specs/system-allowlist.md`.

## Risk
- Medium: replaces the egress-path fetch pipeline. Mitigated by keeping
all #2118 egress-path tests green unchanged (routing, ACL denial, SSRF
block, save_to_file via egress) plus new adapter tests (markdown
conversion, per-hop redirect crossing, request metadata mapping,
policy-denial mapping).
- `respect_proxy_env` is intentionally not forwarded — outbound proxy
policy is host-owned at the egress boundary.

## Security
- Threat categories reviewed: TM-API-008 (SSRF — fetchkit DnsPolicy
resolve-then-check + pinned addrs forwarded to egress client),
TM-AGENT-018 (outbound URL filtering — ACL + system allowlist enforced
at egress per hop, pre-flight retained), TM-TOOL-018 (DNS pinning TOCTOU
— pinned addrs cross the boundary), TM-DOS-001/003 (body caps — owned by
fetchkit, unchanged), TM-AGENT-013 (exfiltration — egress remains the
single boundary).
- Findings: none. Enforcement is strictly equal-or-stronger than #2118
(same boundary checks, plus restored per-hop bot-auth signing). THREAT
comments updated at the new mediation points.

## Validation
- `cargo test -p everruns-core --all-features` — green (incl. all #2118
egress-path tests unchanged + 4 new adapter tests)
- `cargo clippy --all-targets --all-features -- -D warnings` — clean
- `bash scripts/lib/pre-push.sh` — 9/9 pass
- Smoke justification: the egress path is covered end-to-end at the tool
boundary by the existing #2118 tests (mock egress through
`execute_with_context`, including redirects and file saving); no
API/config/migration surface changed.

## Follow-ups
- Consolidate signing policy behind `EgressService` (platform egress
signer) so web_fetch, LLM drivers, and integrations share one signing
path — pre-existing direction, noted in
`specs/fetchkit.md`/`specs/egress.md`; deferred because no platform
signer exists yet.

## Checklist
- [x] Tests added or updated
- [x] Backward compatibility considered (tool contract unchanged; direct
path unchanged)
- [x] Security review performed against relevant threat model categories
- [ ] Final CI ran checks affected by this PR
- [ ] All review comments addressed (code change or written reasoning)

---
_Generated by [Claude
Code](https://claude.ai/code/session_019s8CCGt99LUYpALmH9iYFq)_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants