Parallelise skill file fetches and reuse HTTPS connections#5335
Merged
Conversation
Collaborator
|
Commit: 48261fe |
simonfaltum
approved these changes
May 27, 2026
Member
simonfaltum
left a comment
There was a problem hiding this comment.
LGTM. Focused change, the two halves (parallel fetches + shared transport with bumped MaxIdleConnsPerHost) belong together. Tests verify the right invariants (both goroutines reach the fetcher before either releases; errgroup propagates cancellation to in-flight peers) and survived -race -count=50 locally.
Nits, none blocking:
- The
8concurrency limit and* 2idle-conn slack would benefit from a one-line rationale. sync.OnceValueis the codebase's lazy-init idiom (e.g.libs/clicompat/clicompat.go); the IIFE works fine since init is eager, just flagging for consistency.- Skills are still installed serially across the outer loop, so the cold-start win mostly comes from the shared client keeping idle conns warm between skills, not intra-skill parallelism. Worth a sentence in the PR body.
TestInstallSkillToDirCancelsInFlightFetchesOnErroruses a 1s deadline; bumping to ~5s would be cheap CI insurance.
Each skill file was fetched sequentially with a fresh http.Client per call, so every file paid the full TCP+TLS handshake to raw.githubusercontent.com and `databricks aitools install --experimental` (26 skills, ~120 files) took ~40 s on a cold network. Hoist a shared http.Client with a tuned transport, and fetch a skill's files concurrently with errgroup. Wall-clock drops to under a second. Co-authored-by: Isaac
Cover concurrent skill file fetches and errgroup cancellation so the installer performance path keeps its intended behavior.
- Switch the test's `sync.Once` to `sync.OnceFunc` to satisfy the forbidigo lint rule (the cause of the CI lint failure). - Convert the package-level httpClient initialiser to `sync.OnceValue` for consistency with the codebase's lazy-init idiom (e.g. clicompat). - Add a one-line rationale for the fetchConcurrency value (8) and the MaxIdleConnsPerHost * 2 idle-conn slack. - Bump the cancellation test's outer deadline from 1s to 5s for CI slack. Co-authored-by: Isaac
48261fe to
5564300
Compare
Collaborator
|
Commit: d62dcbb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
http.Clientinlibs/aitools/installer/installer.goso the transport pool reuses TCP+TLS connections across fetches.MaxIdleConnsPerHostis bumped from Go's default 2 → 16 so parallel fetches toraw.githubusercontent.comactually reuse connections instead of churning handshakes.installSkillToDirviaerrgroup.WithContextwithSetLimit(8). First error cancels in-flight peers, preserving the prior bail-on-first-error semantics.Why
databricks aitools installwas sequential: every file constructed a fresh&http.Client{}and threw it away, paying the full TCP+TLS handshake per file. For--experimental(26 skills × ~5 files each = ~120 HTTPS GETs to GitHub raw) that meant wall-clock was dominated by handshake round-trips.Benchmarked locally against
https://raw.githubusercontent.com/databricks/databricks-agent-skills/v0.2.0with a freshly built CLI on each side:The cold-start number is what a first-time
databricks aitools install --experimentaluser actually pays.The
fetchFileFnpackage var that tests stub remains unchanged, so no test mocks need updating.Test plan
go test ./libs/aitools/...— all green, including-race -count=2.gofmt -l libs/aitools/installer/installer.go— clean.go vet ./libs/aitools/...— clean.This pull request and its description were written by Isaac.