feat(transport): pluggable HttpTransport abstraction#135
Merged
Conversation
Introduce a pluggable HTTP transport so a host application can route fetchkit's outbound HTTP through its own egress boundary while fetchkit retains all content logic and security policy. - New `transport` module: `HttpTransport` trait, `TransportRequest` (with policy-pinned `pinned_addrs`), streaming `TransportResponse`, `TransportError`, and default `ReqwestTransport`. - `FetchOptions` gains `transport: Option<Arc<dyn HttpTransport>>` (None => ReqwestTransport); Clone via Arc, manual Debug. - `ToolBuilder::transport(Arc<dyn HttpTransport>)` threads the transport into the Tool's FetchOptions, so hosts that consume fetchkit through the Tool surface (description/schema/llmtxt + execute / execute_with_saver) keep that surface while owning egress. Tool and ToolBuilder switch to manual Debug like FetchOptions. - All fetchers now perform HTTP exclusively through the transport. DNS policy resolution still runs in fetchkit and produces the pinned addrs; manual per-hop redirect following, bot-auth signing, and body/timeout caps stay in fetchkit. Only the socket send is delegated. - `read_body_with_timeout` reads the transport's boxed byte stream so body-size caps and partial-content-on-timeout keep working mid-stream. - Add `DnsPolicy::pinned_addrs` helper. - Tests: in-memory mock transport proves GET/HEAD/redirect/body-cap via a custom transport plus the wikipedia fetcher; pinned_addrs population test; Tool::execute and Tool::execute_with_saver run entirely through an injected mock transport with no network. Default behavior is unchanged (ReqwestTransport).
There was a problem hiding this comment.
Pull request overview
Introduces a pluggable HttpTransport abstraction so host applications can provide their own egress path for all outbound HTTP, while fetchkit continues to own URL validation, DNS policy (pinning), redirect validation, bot-auth signing, and body/timeout caps.
Changes:
- Added
HttpTransporttrait + defaultReqwestTransport, and plumbed transport selection throughFetchOptionsand theToolsurface. - Refactored all fetchers to perform HTTP exclusively via the transport, with manual per-hop redirect validation and DNS pinning where applicable.
- Added integration tests for transport injection and updated threat model/spec docs + changelog.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| specs/threat-model.md | Documents the security posture/ownership boundaries for the new transport layer. |
| specs/fetchers.md | Adds spec-level documentation on transport injection and invariants (single-hop, pinned addrs). |
| crates/fetchkit/tests/transport.rs | New integration tests validating transport injection, redirect behavior, and body caps. |
| crates/fetchkit/src/transport.rs | New transport trait/types + default reqwest-backed transport implementation. |
| crates/fetchkit/src/tool.rs | Adds ToolBuilder::transport() and carries transport through all Tool execution paths. |
| crates/fetchkit/src/lib.rs | Exposes the transport module and re-exports transport types in the public API. |
| crates/fetchkit/src/fetchers/youtube.rs | Routes YouTube fetcher HTTP via transport; adds DNS pinning for YouTube host. |
| crates/fetchkit/src/fetchers/wikipedia.rs | Replaces reqwest redirects with manual per-hop redirects via transport. |
| crates/fetchkit/src/fetchers/twitter.rs | Refactors Twitter fetcher to use transport + shared headers helper. |
| crates/fetchkit/src/fetchers/stackoverflow.rs | Refactors StackOverflow API calls to use transport + capped JSON reads. |
| crates/fetchkit/src/fetchers/rss_feed.rs | Updates to transport response types + header lookup helper. |
| crates/fetchkit/src/fetchers/package_registry.rs | Uses transport with per-hop redirect validation for registry APIs. |
| crates/fetchkit/src/fetchers/hackernews.rs | Uses transport with per-hop redirect validation + capped JSON parsing. |
| crates/fetchkit/src/fetchers/github_repo.rs | Uses transport for GitHub API subrequests with DNS pinning and body caps. |
| crates/fetchkit/src/fetchers/github_issue.rs | Uses transport for GitHub issue/PR API calls with capped JSON parsing. |
| crates/fetchkit/src/fetchers/github_code.rs | Uses transport for GitHub contents API requests with DNS pinning. |
| crates/fetchkit/src/fetchers/docs_site.rs | Updates to transport responses; caps llms.txt reads with streaming timeout/size. |
| crates/fetchkit/src/fetchers/default.rs | Core refactor: manual redirect loop now executes via transport; streaming caps applied to transport bodies. |
| crates/fetchkit/src/fetchers/arxiv.rs | Uses transport with manual per-hop redirect validation for arXiv API calls. |
| crates/fetchkit/src/dns.rs | Adds DnsPolicy::pinned_addrs() helper to produce transport pin sets. |
| crates/fetchkit/src/client.rs | Adds FetchOptions.transport + internal resolver to choose default transport. |
| CHANGELOG.md | Documents the new transport abstraction and its security guarantees. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+170
to
+172
| if let Some(timeout) = req.timeout { | ||
| builder = builder.connect_timeout(timeout).timeout(timeout); | ||
| } |
| request = request.header(name, value); | ||
| } | ||
|
|
||
| let response = request.send().await.map_err(TransportError::Reqwest)?; |
chaliy
added a commit
that referenced
this pull request
Jun 12, 2026
## Release v0.4.0 Breaking minor release: pluggable HTTP transport abstraction (#135). ### Why 0.4.0 (not 0.3.1) `FetchOptions` gained a public `transport` field and it is an exhaustive pub struct — formally breaking for exhaustive struct-literal construction. Cargo auto-upgrades within `^0.3`, so the breaking change must move out of the 0.3.x range. ### Changelog excerpt **Highlights** - Pluggable HTTP transport: hosts route fetchkit HTTP through their own egress boundary (`FetchOptions::transport` / `ToolBuilder::transport`); fetchkit keeps URL validation, DNS pinning, redirect following, signing, body caps. Default unchanged (`ReqwestTransport`). - Hardening: YouTube/HackerNews API hosts DNS-pinned; Wikipedia/registry/arXiv/HackerNews redirects per-hop validated; specialized-fetcher JSON reads capped at `max_body_size`. - Local file saver symlink handling hardened. **Breaking Changes** - `FetchOptions` new public `transport` field — add it or use `..Default::default()`. - `FetchOptions`/`Tool`/`ToolBuilder` Debug impls are now manual. ### Verification - `cargo fmt --all -- --check` ✓ - `cargo clippy --workspace --all-targets -- -D warnings` ✓ - `cargo test --workspace` ✓ (all suites) - `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` ✓ - `cargo build --workspace --exclude fetchkit-python --release` ✓ After merge, release.yml creates the GitHub Release `v0.4.0` and dispatches publish.yml (fetchkit → fetchkit-cli, in order).
5 tasks
chaliy
added a commit
to everruns/everruns
that referenced
this pull request
Jun 12, 2026
) ## What Replace the egress path's hand-rolled fetch pipeline with fetchkit 0.4's transport injection: `EgressHttpTransport` implements `fetchkit::HttpTransport` over `EgressService`, injected per execution via `ToolBuilder::transport` when `ToolContext.egress_service` is present. Bumps fetchkit `=0.2.0` → `=0.4.0`. Net **−721 lines**. ## Why #2118 migrated web_fetch transport behind the egress boundary but had to bypass fetchkit's clients, reimplementing ~990 lines of redirect/binary/truncation logic and losing the specialized fetchers — its docs noted "Restoring them requires upstream transport injection in fetchkit." That upstream API shipped in fetchkit 0.4.0 (everruns/fetchkit#135). This PR adopts it: fetchkit keeps its whole pipeline (specialized fetchers GitHub/Wikipedia/arXiv/…, DNS policy, per-hop redirect validation, body caps, bot-auth signing) while every HTTP hop still crosses the egress boundary. ## How - `web_fetch_egress.rs`: 990-line reimplementation → ~115-line `EgressHttpTransport` adapter (+ tests). Maps `TransportRequest` → `EgressRequest` (kind=Capability, signing=PlatformDefault, per-hop `network_access`, fetchkit's resolved `pinned_addrs` → `EgressRequest.pinned_addrs` for TM-TOOL-018 pinning) and streams the response body back. - `web_fetch.rs`: stores the `ToolBuilder` template; per execution builds an egress-routed Tool when the context has an egress service, direct Tool otherwise. Both paths now share one execution flow (incl. `save_to_file`). - System-allowlist + ACL pre-flight checks on the initial URL restored on **both** paths for distinct user-facing errors (`Endpoint blocked by system policy…` / `URL blocked by network access policy…`); the egress boundary remains the final enforcement point on every hop. - Fixes a #2118 message regression: egress denials no longer use the pre-#2113 "deployment's system allowlist" wording (now: `Outbound request blocked by network policy: {url}`). - Behavior changes vs #2118's egress path: specialized fetchers restored; bot-auth signing restored (fetchkit signs each hop before it crosses the boundary — #2118 had silently dropped signing on the egress path); SSRF resolve-then-check now via fetchkit `DnsPolicy` (same protections as `validate_url_dns_pinned`: loopback/RFC1918/link-local/metadata blocked, resolve-then-check with pinning). - Specs updated: `specs/fetchkit.md`, `specs/system-allowlist.md`. ## Risk - Medium: replaces the egress-path fetch pipeline. Mitigated by keeping all #2118 egress-path tests green unchanged (routing, ACL denial, SSRF block, save_to_file via egress) plus new adapter tests (markdown conversion, per-hop redirect crossing, request metadata mapping, policy-denial mapping). - `respect_proxy_env` is intentionally not forwarded — outbound proxy policy is host-owned at the egress boundary. ## Security - Threat categories reviewed: TM-API-008 (SSRF — fetchkit DnsPolicy resolve-then-check + pinned addrs forwarded to egress client), TM-AGENT-018 (outbound URL filtering — ACL + system allowlist enforced at egress per hop, pre-flight retained), TM-TOOL-018 (DNS pinning TOCTOU — pinned addrs cross the boundary), TM-DOS-001/003 (body caps — owned by fetchkit, unchanged), TM-AGENT-013 (exfiltration — egress remains the single boundary). - Findings: none. Enforcement is strictly equal-or-stronger than #2118 (same boundary checks, plus restored per-hop bot-auth signing). THREAT comments updated at the new mediation points. ## Validation - `cargo test -p everruns-core --all-features` — green (incl. all #2118 egress-path tests unchanged + 4 new adapter tests) - `cargo clippy --all-targets --all-features -- -D warnings` — clean - `bash scripts/lib/pre-push.sh` — 9/9 pass - Smoke justification: the egress path is covered end-to-end at the tool boundary by the existing #2118 tests (mock egress through `execute_with_context`, including redirects and file saving); no API/config/migration surface changed. ## Follow-ups - Consolidate signing policy behind `EgressService` (platform egress signer) so web_fetch, LLM drivers, and integrations share one signing path — pre-existing direction, noted in `specs/fetchkit.md`/`specs/egress.md`; deferred because no platform signer exists yet. ## Checklist - [x] Tests added or updated - [x] Backward compatibility considered (tool contract unchanged; direct path unchanged) - [x] Security review performed against relevant threat model categories - [ ] Final CI ran checks affected by this PR - [ ] All review comments addressed (code change or written reasoning) --- _Generated by [Claude Code](https://claude.ai/code/session_019s8CCGt99LUYpALmH9iYFq)_
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Pluggable
HttpTransporttrait so hosts can route all fetchkit HTTP through their own egress boundary. fetchkit keeps URL validation, DNS policy (resolve-then-check producingpinned_addrs), manual per-hop redirect following, body caps/timeouts, bot-auth signing, and content conversion — only the socket-level send moves behind the trait.Why
Host applications (Everruns) need a single host-owned egress boundary for outbound traffic (policy, audit, signing). Everruns' egress path currently bypasses fetchkit's clients entirely and loses all specialized fetchers (everruns/everruns#2118). Transport injection restores them: fetchkit stays the request/response adapter, host owns transport.
How
transport.rs:HttpTransporttrait,TransportRequest(method/url/headers/timeout/pinned_addrs/respect_proxy_env), streamingTransportResponse,TransportError, defaultReqwestTransportpreserving exact historical behavior (per-request client, no redirects, proxy env suppressed,resolve_to_addrspinning).FetchOptions.transport: Option<Arc<dyn HttpTransport>>(None => ReqwestTransport);ToolBuilder::transport()for the Tool surface — all execution paths honor it.Security
pinned_addrswhen non-empty.Policy::limited(3)); fix(fetchers): enforce policy for GitHub API subrequests #131 subrequest policy enforcement preserved.max_body_size(previously unboundedresp.json()).Validation
--all-targets --all-features -D warnings) cleancargo testdefault +bot-authfeatures: all green (298 unit + suites; 8 transport tests incl. mock-transport Tool/DefaultFetcher/redirect/body-cap, pinned_addrs population)--all-featureslive-network tests: 4 pre-existing GitHub rate-limit failures, unrelated (hit real hosts via default transport)http://127.0.0.1blocked; proxy env ignored by default