Skip to content

Implement rate limit error response handling and cleanup#3

Open
FrancoStino wants to merge 1 commit into
cortexkit:mainfrom
FrancoStino:fix/switch-model-rate-limit
Open

Implement rate limit error response handling and cleanup#3
FrancoStino wants to merge 1 commit into
cortexkit:mainfrom
FrancoStino:fix/switch-model-rate-limit

Conversation

@FrancoStino

@FrancoStino FrancoStino commented Jun 20, 2026

Copy link
Copy Markdown
  • Introduced a new response function to handle rate limit errors with HTTP 403 status.
  • Updated existing error responses to utilize the new function for consistency.
  • Cleaned up unused code and improved readability in related functions.

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.


Summary by cubic

Standardized rate limit handling by introducing a 403 error response with optional Retry-After and routing all quota-related errors through it. This gives clearer, consistent messages when accounts hit soft or hard limits.

  • Refactors

    • Added createRateLimitErrorResponse and used it for soft-quota blocks, all-accounts rate limited, and switch-exhausted paths.
    • Replaced older synthetic error responses; unified wording and wait-time hints.
    • Simplified helpers and removed unused code (e.g., capacity backoff tiers, account-level reset, unused vars); improved URL regex and boolean checks.
  • Migration

    • Clients should treat HTTP 403 as rate-limit/quota exhaustion and honor Retry-After if present.

Written for commit 774b254. Summary will update on new commits.

Review in cubic

…dling

- Introduced a new response function to handle rate limit errors with HTTP 403 status.
- Updated existing error responses to utilize the new function for consistency.
- Cleaned up unused code and improved readability in related functions.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

@FrancoStino

Copy link
Copy Markdown
Author

/codesmith review this PR

@blacksmith-sh

blacksmith-sh Bot commented Jun 20, 2026

Copy link
Copy Markdown

I'm on it! Give me a moment to analyze the code and I'll respond shortly.

View with Codesmith

@blacksmith-sh

blacksmith-sh Bot commented Jun 20, 2026

Copy link
Copy Markdown

@FrancoStino

/codesmith review this PR

I've completed my review. Here's what I found.

Verdict

The core idea, routing quota/rate-limit dead-ends through an HTTP 403 (instead of createSyntheticErrorResponse's 200+SSE) so OpenCode emits session.error and OmoSlim's fallback fires instead of burning the SDK's 429 retry loop, is sound and the rationale is well documented in the new function. The dead-code cleanup is legitimate and safe. But there's one real regression, one correctness question, and a significant reviewability problem.

Should fix (regression)

plugin.ts:151 changes enabled semantics. The cleanup rewrote if (!account || account.enabled === false) return; to if (!account || !account.enabled) return;. These are not equivalent: enabled is optional (storage.ts:209 enabled?: boolean), and this codebase intentionally treats undefined as enabled. That invariant is documented (CHANGELOG line 205: "getEnabledAccounts now treats undefined as enabled... accounts without an explicit enabled field were incorrectly filtered out") and enforced everywhere else: accounts.ts getEnabledAccounts() filters account.enabled !== false, and plugin.ts:642, :674, :2930 all use === false / !== false. With !account.enabled, triggerAsyncQuotaRefreshForAccount now early-returns for any account whose enabled is undefined (legacy/implicitly-enabled accounts), so their quota never gets background-refreshed. Revert this one to account.enabled === false.

Should verify

The 403 body doesn't always contain "rate limit". The new function's own doc comment says the body must match OmoSlim's isRateLimitError regex (/rate.?limit/i) to trigger fallback. But 3 of the 5 call sites pass messages that don't contain that substring: the soft-quota block (plugin.ts:1761, "Quota protection: All N account(s) are over X% usage..."), "All Antigravity endpoints failed" (:2813), and "All Antigravity accounts failed" (:2841), plus the max-switch path (:2788) whenever lastError.message is non-empty. If OmoSlim keys its fallback decision off the body text, those paths will surface a hard 403 rather than falling back. Either ensure those messages contain "rate limit"/"rate-limited", or confirm OmoSlim keys off the 403 status / X-Antigravity-Error-Type header instead of the body. (OmoSlim isn't in this repo, so I can't verify which.)

Naming/labeling mismatch. createRateLimitErrorResponse hardcodes X-Antigravity-Error-Type: rate_limit_all_accounts, but it's now also the terminal response for generic "all endpoints/accounts failed" cases that may be network or auth failures (via lastError), not rate limits. That mislabels non-rate-limit failures. Consider a more general name/header, or pass the error type in.

Reviewability (please address)

request-helpers.ts was reformatted wholesale. The actual change there is ~35 lines (the new function plus a few simplifications), but the diff is 4021 lines because the entire file was reindented from 2-space to 4-space and the imports were restructured. There's no prettier/biome/editorconfig in the repo, and everything else, including plugin.ts in this same PR and the sibling request.ts, uses 2-space indentation. So this churn buries the real change, makes the file inconsistent with the rest of the codebase, pollutes git blame, and invites merge conflicts. I'd strongly recommend reverting the formatting-only diff and keeping 2-space, leaving just the genuine edits. (I had to use git diff -w to extract the real change for this review.)

Nits / FYI (all verified safe, non-blocking)

  • Dead-code removal (CAPACITY_BACKOFF_TIERS_MS, getCapacityBackoffDelay, resetAllRateLimitStateForAccount, and the unused locals cliFirst, waitSecValue, maxBackoffMs, isDuplicate, the duplicate accountLabel/quotaName, parseRefreshParts/storedAccounts) all confirmed to have no remaining references. Good cleanup. status is assigned on every branch and responseBody is initialized at declaration, so dropping their initializers is fine.
  • Removed imports cacheSignature and SKIP_THOUGHT_SIGNATURE from request-helpers.ts: confirmed unused in that file (they're still used in request.ts), so safe.
  • tryMergeEnumFromUnion: if (option.type && !option.const && !option.enum)if (option.type) is effectively equivalent (by that point const is undefined and enum is absent), except a degenerate { type, enum: [] } now returns null where it previously didn't, which is arguably more correct.
  • The 200-SSE → 403 switch is a real client-visible behavior change, but intentional and already covered by the PR's migration note.
  • No test covers the triggerAsyncQuotaRefreshForAccount enabled path; given it just regressed, a small unit test guarding "undefined enabled is treated as enabled" would be worth adding (there's already refresh-queue.test.ts exercising enabled: false).

Net: fix the enabled === false regression, resolve the "rate limit" body/regex question, and drop the request-helpers.ts reformatting so the diff reflects the ~35 lines that actually changed. Happy to push those fixes to the branch if you want.

View with Codesmith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant