Skip to content

refactor(rate-limit): key free-model limit on user id for authenticated requests#3004

Open
kilo-code-bot[bot] wants to merge 12 commits intomainfrom
chrarnoldus/free-model-rate-limit-by-user
Open

refactor(rate-limit): key free-model limit on user id for authenticated requests#3004
kilo-code-bot[bot] wants to merge 12 commits intomainfrom
chrarnoldus/free-model-rate-limit-by-user

Conversation

@kilo-code-bot
Copy link
Copy Markdown
Contributor

@kilo-code-bot kilo-code-bot Bot commented May 2, 2026

Summary

  • Free-model rate limiting is now keyed on the authenticated user id for all authenticated requests, regardless of feature or source IP.
  • Anonymous requests continue to be rate-limited per IP address, but now count only anonymous (unauthenticated) rows in free_model_usage so the limit is not skewed by authenticated users sharing the same IP.
  • Removes the USER_RATE_LIMITED_FEATURES / isUserRateLimitedFeature / Cloudflare-IP special case for cloud-agent, code-review, and app-builder — they are handled uniformly now.
  • Updates the /admin/free-model-usage panel to reflect the new semantics (see below).

Behavioural notes:

  • resolveRateLimit now awaits auth before deciding the key. The pre-auth IP fast-path for authenticated callers is gone; the trade-off is correct per-user accounting across shared-infra IPs.
  • checkFreeModelRateLimit(ipAddress) is now anonymous-only; checkFreeModelRateLimitByUser(userId) is unchanged and used for every authenticated request.
  • checkPromotionLimit (10k/24h anonymous gate) is unaffected.

Admin panel updates

  • RateLimitTesting now rate-limits the admin's own user id (getMyUsage / rateLimitMe). The previous version inserted anonymous rows for the admin's IP, which no longer affects the admin's per-user limit after this change.
  • Stats endpoint replaces windowIpsAtRequestLimit with two separate counters matching the two gates:
    • windowAnonymousIpsAtRequestLimit — anonymous IPs whose anonymous-only count has reached the limit.
    • windowUsersAtRequestLimit — authenticated users whose per-user count has reached the limit.
  • Page and card copy updated from "IP-based … applies to both anonymous and authenticated" to the hybrid per-user / per-IP description.

Verification

  • Exercised the changed code path by reading through apps/web/src/app/api/openrouter/[...path]/route.ts to confirm anonymous free-model requests still run through the promotion limit, and authenticated ones hit the user-keyed counter.
  • Confirmed the admin "Rate Limit Testing" button inserts rows keyed on kilo_user_id, so it actually triggers a 429 for the admin's own subsequent authenticated requests.
  • Reviewed feature-detection.test.ts to confirm only the removed helper's tests needed deletion.

Full pnpm typecheck / pnpm test / pnpm format were skipped locally (sandbox has no node_modules); CI will run them.

Visual Changes

Admin panel — the "IPs at Request Limit" card is replaced with two cards: "Anonymous IPs at Limit" and "Users at Limit". Intro copy updated to describe the per-user / per-IP split.

Reviewer Notes

  • Risk: the pre-auth fast path is now gone for authenticated free-model traffic. Auth latency will add to every rate-limit decision. Auth is already kicked off in parallel earlier in the handler, so the added latency should be minimal.
  • Risk: anonymous IP counts used to include authenticated rows; after this change the anonymous 200/hr limit effectively resets for those IPs on rollout. That is intentional but worth noting.
  • No DB schema change. No new PII.

Database indexes on free_model_usage

No migration is introduced; the existing indexes cover the new access pattern:

  • idx_free_model_usage_user_created_at on (kilo_user_id, created_at) WHERE kilo_user_id IS NOT NULL — already serves checkFreeModelRateLimitByUser. The partial predicate matches the query predicate, so read and write costs are unchanged; the index simply becomes the hot read path for all authenticated features instead of only the three server-side ones.
  • idx_free_model_usage_ip_created_at on (ip_address, created_at) — still used for checkFreeModelRateLimit, but the query now adds kilo_user_id IS NULL. Postgres will range-scan the index and filter out non-null rows. Given the 1-hour window and per-IP volumes this is fine, but a follow-up could make this a partial index WHERE kilo_user_id IS NULL (or drop it in favor of one) to avoid reading rows that will be filtered out. Not required for correctness.
  • idx_free_model_usage_created_at — unaffected (admin analytics / cleanup cron).

No write-amplification change: every insert already updates the same set of indexes (the partial user index only fires for authenticated rows, which is unchanged).

…ed requests

Authenticated free-model requests are now rate-limited per user id regardless of
feature or source IP. Anonymous requests continue to be rate-limited per IP,
counting only anonymous usage so they aren't skewed by authenticated users on
shared IPs. This removes the feature/Cloudflare-IP special case that existed for
cloud-agent, code-review and app-builder.
@kilo-code-bot
Copy link
Copy Markdown
Contributor Author

kilo-code-bot Bot commented May 2, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)
  • apps/web/src/app/admin/api/free-model-usage/stats/route.ts
  • apps/web/src/app/admin/components/FreeModelUsageStats.tsx
  • apps/web/src/app/admin/components/UserRateLimitStats.tsx
  • apps/web/src/app/admin/free-model-usage/page.tsx

Reviewed by gpt-5.5-2026-04-23 · 1,170,243 tokens

chrarnoldus and others added 5 commits May 4, 2026 11:34
… limit

- Rate Limit Testing now operates on the admin's user id (authenticated)
  instead of their IP; previously the inserted anonymous rows had no effect
  on the admin's own per-user limit.
- Stats endpoint splits "at limit" into anonymous IPs (anonymous-only
  count) and authenticated users (per-user count), matching the two gates.
- Updated page and card copy to describe the hybrid user/IP semantics.
…uter route

Remove the `noFreeModelsAvailableResponse` import and the corresponding check
after `applyResolvedAutoModel` in the OpenRouter API route. Also includes
minor formatting updates to admin components.
Comment thread apps/web/src/app/api/openrouter/[...path]/route.ts Outdated
kilo-code-bot Bot added 4 commits May 5, 2026 10:57
Merge with main dropped the autoResult.kind check, so kilo-auto requests
with no free candidates continued through rate limiting and were sent
upstream as the synthetic auto-model id instead of returning the 503.
Also re-apply oxfmt formatting (repo oxfmt 0.40.0).
Use the actual x-forwarded-for IP instead of a sentinel string so the
inserted rows remain useful in analytics and match the shape of normal
free-model usage rows.
The two rate-limit gate cards (anonymous IPs and users) are the
actionable signal on this page. Accent them with a primary border at
rest and a destructive border + red number when the count is non-zero.
@lambertjosh
Copy link
Copy Markdown
Contributor

@chrarnoldus - maybe we could have a the user-based sign-in rate limit be its own little section so you can see how many are affected, and then which actual userid's like you can with IP's?

image

Per Josh's review feedback, surface the per-user rate-limit signal as
its own section showing both the count of users at the limit and a
table of the actual user ids (with name/email/avatar via the existing
UserAvatarLink). Removes the now-duplicate 'Users at Limit' card from
the aggregate stats grid.
@lambertjosh
Copy link
Copy Markdown
Contributor

@chrarnoldus - I'm still seeing a similar set of cards?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants