Skip to content

fix(jitsu-cli): fetch function list once, not per file#1306

Merged
absorbb merged 4 commits into
newjitsufrom
fix/jitsu-cli-deploy-N-squared
May 19, 2026
Merged

fix(jitsu-cli): fetch function list once, not per file#1306
absorbb merged 4 commits into
newjitsufrom
fix/jitsu-cli-deploy-N-squared

Conversation

@vklimontovich
Copy link
Copy Markdown
Contributor

Summary

jitsu-cli deploy was making O(N²) requests to the console: for every function being deployed, deployFunction() re-fetched the entire workspace function list (including each function's code blob) just to resolve slug → id. On fusionmedialimited/jitsu-functions (≈63 functions) that's ~63 list calls × 63 rows = ~3,900 row reads per deploy, on top of the 63 PUTs. It's the visible cause of console pod CPU spikes during deploys.

This PR hoists the GET /api/<ws>/config/function call out of the per-file loop into deployFunctions(), builds slug/id lookup maps once, and passes them down. Network and DB load drop from N + N×N to N + 1.

Why this matters

  • Deploys touching the clpwf99ji0000jp0fi5hl01t8 workspace currently take ~4 min and pin one or two console pods to high CPU for that window (see run 26035361978). The hot path was the redundant list fetch, not the write.
  • The list endpoint (pages/api/[workspaceId]/config/[type]/index.ts) returns every config object's full config JSON; for functions that includes the compiled code. Multiplying that by N is wasteful even ignoring DB load.

Behavior preserved

  • Same lookup semantics: slug first, fall back to meta.id.
  • Same error path on initial list failure (process.exit(1)).
  • deployFunction() keeps an optional default-empty lookup so it stays callable standalone (e.g. from tests).
  • No change to the POST / PUT payloads or to profile-builder handling.

Test plan

  • pnpm typecheck in cli/jitsu-cli
  • pnpm build in cli/jitsu-cli
  • Manual: run pnpm jitsu-cli deploy --workspace … --apikey … against the QA workspace and confirm all functions deploy + console CPU stays flat
  • Confirm release lands in the published jitsu-cli package so the fusionmedialimited/jitsu-functions CI picks it up

Out of scope (follow-ups worth doing)

  • The same loop also calls deployFunction with kind === "profile" but the GET inside still hit /config/function — looks like a long-standing bug (profiles never resolve their existingFunctionId, so they get POSTed-new every deploy). Preserving for now to keep this PR minimal.
  • Console-side: add ?fields=id,slug projection on the list endpoint so even the single hoisted call doesn't pull every function's code.
  • Console-side: add CPU/memory requests/limits + HPA to jitsu-console (separate PR in jitsu-cloud-infra).

Previously deployFunction() refetched the entire workspace function list
on every iteration to resolve slug → id. With N functions in the
workspace that's N list calls, each returning N rows including the full
code blob — O(N²) load on the console and Postgres, and the visible
cause of console CPU spikes during fusionmedialimited/jitsu-functions
deploys.

Hoist the GET out of the loop into deployFunctions(), build a slug/id
lookup once, and pass it down.
@vklimontovich vklimontovich marked this pull request as ready for review May 18, 2026 14:54
Copy link
Copy Markdown

@jitsu-code-review jitsu-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few correctness regressions in the provided change range:

  1. High — Firestore pagination can skip/duplicate documents

    • bulker/connectors/firebase/firebase.go:284,350
    • The first batch is fetched with collection.Limit(batchSize) (no deterministic ordering), but subsequent batches switch to OrderBy(firestore.DocumentID).StartAfter(lastDoc.Ref.ID). Using a cursor derived from an unordered page against an ordered query is not stable, so large collections can miss or re-read records.
    • Please keep the ordering consistent from the first page (same OrderBy for every batch) and use the same cursor form across pages.
  2. Medium — CRON_ENABLED is no longer respected

    • docker-start-console.sh:14-15
    • init() now always starts cron, whereas previous behavior honored CRON_ENABLED=0/no/false.
    • This is a user-visible behavior change and can trigger background jobs in environments that explicitly disabled cron.
  3. Medium — API rate limiting protections were removed without replacement

    • webapps/console/lib/api.ts (rate limiter enforcement block removed in nextJsApiHandler), and webapps/console/pages/api/workspace/index.ts (POST route-specific cap removed).
    • This drops request throttling safeguards, including the explicit workspace-creation cap, and reopens abuse/DoS paths for authenticated traffic.
    • If this is intentional, we need an equivalent protection at another layer before merging.

absorbb
absorbb previously approved these changes May 19, 2026
Copy link
Copy Markdown

@jitsu-code-review jitsu-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the perf improvement here. I found one behavior regression that should be fixed before merge.

// Fetch the existing function list once for slug/id resolution. Previously
// every deployFunction() call refetched the whole list (with code blobs) —
// O(N) requests per deploy, each pulling all N rows from the console DB.
const existingFunctions = await fetchExistingFunctions({ host, apikey, workspaceId: workspace.id! });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching this list once changes deploy semantics within a single run: after a successful create, the cache is not updated, so later files still see not found and can attempt a second POST for the same slug/id.

Before this change, each file re-fetched and the second file switched to PUT. Please update the cache after successful POST (and on relevant PUT slug changes) to preserve previous behavior.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in d5659c9. Two helpers now mutate the cache on success: cacheAfterCreate (POST) inserts the new {id, slug} into both maps, cacheAfterUpdate (PUT) refreshes the slug index (dropping the old slug → id link when the slug was renamed). The entry is the same object reference in byId and bySlug, so one mutation covers both. A second file with the same slug now resolves to the freshly-created id and switches to PUT, matching the pre-hoist behavior.

Without this, hoisting the GET out of the per-file loop changed
within-run semantics: after a successful POST the cache was stale, so a
later file with the same slug or meta.id would see "not found" and POST
a second time, producing a duplicate function. The pre-hoist code
side-stepped this because each file re-fetched the full list.

Restore the original behavior by mutating the cache on success:

- POST: insert the new {id, slug} into both byId and bySlug.
- PUT: update the entry's slug; if it differs from what we had, drop the
  stale bySlug entry so a later file deploying under the old slug
  creates a new function instead of clobbering this one.

Entry refs are shared between bySlug and byId (single object identity),
so one mutation covers both maps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes May 19, 2026
Copy link
Copy Markdown

@jitsu-code-review jitsu-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in this PR.

The hoisted existing-function cache preserves deploy semantics while reducing redundant API calls; I did not find correctness, security, or user-visible regression issues in the changed code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@jitsu-code-review jitsu-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the deploy cache changes in cli/jitsu-cli/src/commands/deploy.ts. The hoisted function-list fetch and cache synchronization on POST/PUT look correct, and I did not find actionable bugs, security issues, or user-visible regressions in this patch.

@absorbb absorbb merged commit beb6d4e into newjitsu May 19, 2026
7 checks passed
@absorbb absorbb deleted the fix/jitsu-cli-deploy-N-squared branch May 19, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants