feat(umans): add Umans provider with request-based quota tracking#164
feat(umans): add Umans provider with request-based quota tracking#164claw-io wants to merge 27 commits into
Conversation
…a cleanup - Native Anthropic endpoint routing for Claude models via NanoGPT - Anthropic format converters for OpenAI<->Anthropic message translation - Fix streaming fallback: convert static ModelResponse to fake stream - Fix stream parameter handling to prevent duplicate arguments - Remove obsolete monthly quota group from NanoGPT provider - Add embedding routing support in executor - Clean up _anthropic_payload from kwargs before LiteLLM calls
Custom provider for Google Vertex AI Express Mode API keys that uses x-goog-api-key header authentication against the Vertex AI OpenAI-compatible endpoint. Supports non-streaming and streaming chat completions with automatic model discovery. Models are prefixed as vertex/ (e.g. vertex/gemini-3.1-flash-lite-preview). Env vars: VERTEX_PROJECT, VERTEX_LOCATION, VERTEX_API_KEY_N
…ing and scraped balance
…g and credit monitoring
Track KiloCode credit balance through the Kilo web dashboard session cookie. Fetches /api/user on a background interval and surfaces the balance as credits($) in the TUI. The session token is auto-refreshed on each poll via /api/auth/session.
…ased model filtering, and enhanced X-Initiator heuristic
…ardization, and utilities
Core infrastructure improvements:
- Smart 'latest' model alias resolution with cost-based tiebreaking
- Standardized error responses with proper HTTP status codes and error.code field
- ProxyExhaustionError for structured credential exhaustion reporting
- TerminalRequestError for non-rotatable errors (404, model not found)
- Per-provider retry count override via MAX_RETRIES_{PROVIDER} env var
- Retry 429 rate_limit errors with backoff instead of rotating
- Cached token pricing in streaming cost calculation
- Split quota stats into current_period and global/lifetime views
- Log rotation for proxy.log and proxy_debug.log (RotatingFileHandler)
- Include latest virtual models in /v1/models endpoint
- Resolve singleton cache pollution for dynamic providers
- Fork-specific README and .gitignore updates
…dential routing Add configurable proxy routing for outbound LLM API traffic. ProxyConfig supports: - Global default proxy (PROXY_URL_DEFAULT) - Per-provider proxies (PROXY_URL_<PROVIDER>) - Per-credential proxies (PROXY_URL_CREDENTIAL_<STABLE_ID>) - Rotation pool with round-robin/random strategy - JSON file config (PROXY_CONFIG_PATH) for complex setups Resolution priority: per-credential > per-provider > rotation > global > direct. Supports http, https, socks5, socks5h, and socks4 schemes. Prefers socks5h:// (remote DNS) over socks5:// (local DNS) to avoid resolution failures in containerized environments. ProxiedClientPool manages httpx.AsyncClient instances per proxy URL. LiteLLM integration uses openai.AsyncOpenAI with proxied http_client to satisfy internal OpenAI-compatible code paths. Requires: socksio (added to requirements.txt)
Add two new per-credential quota systems integrated into the limit
engine pipeline, with full UI support in TUI and WebUI:
Monthly Budget: env-driven spending cap (MONTHLY_BUDGET_{PROVIDER}=N)
that blocks credentials once cumulative cost exceeds the budget.
Resets on a configurable day of the month.
RPD (Requests Per Day): env-driven per-model daily request caps
(RPD_LIMIT_{PROVIDER}_{MODEL}=N) with alias support
(RPD_ALIAS_{PROVIDER}_{ALIAS}=canonical) for latest-model name
resolution. Counters reset at midnight in the configured timezone.
Both features are opt-in via environment variables with no hardcoded
defaults. Status is exposed in /v1/quota-stats and rendered in the
TUI summary/detail views and WebUI credential cards.
…flow Replaces the old manifest-driven multi-branch replay system with a simpler linear commit stack. Changes are made via fixup!/autosquash. Upstream syncs are a single git rebase. Includes: - AGENTS.md: entry point for all AI coding agents - .agent/rules/claude.md: Claude-specific SSH/deployment notes - .agent/rules/llm-proxy.md: container layout and deployment pipeline - .agent/skills/upstream-sync/SKILL.md: sync workflow reference
… explorer, and settings Replace TUI with a React/TypeScript/Tailwind web interface served from FastAPI. Backend: new admin API routers for /v1/admin/transactions, /v1/admin/failures, /v1/admin/config, /v1/admin/credentials, /v1/admin/oauth/providers, and a /v1/ws WebSocket endpoint for real-time quota/error updates. Static file serving under /ui with SPA fallback. Frontend: Vite + React 19 + Tailwind v4 + shadcn-style components. Pages: Dashboard (health, providers, errors), Quota (per-provider/credential drill-down with progress bars), Log Explorer (transactions + failures with JSON viewer), Credentials (API key + OAuth management), Models (searchable catalog), Settings (config, filters, aliases, custom providers). Auth via Bearer token stored in localStorage with remote proxy URL support. Multi-stage Dockerfile adds Node build stage for the webui dist.
- Switch git-cliff range from $LAST_TAG..HEAD to upstream/dev..HEAD - Add incremental diff step comparing fork_state markers between releases - Embed state markers in release body for next build consumption - Add upstream sync reference line to release notes - Drop broken tag-hunting logic (~90 lines) that relied on orphaned tags - Add Release Notes subsection and Rule 8 to AGENTS.md
- xai_auth_base.py: OAuth2 base class inheriting from OpenAIOAuthBase - PKCE Authorization Code flow with loopback redirect (127.0.0.1:56121) - Device Code flow for headless environments - Auto-selects Device Code in headless, offers choice in interactive - Uses xAI public client ID (b1a00492-073a-47ea-816f-4c329264a828) - Scopes: openid, offline_access, grok-cli:access - xai_provider.py: ProviderInterface implementation - Routes through LiteLLM's native xai/ prefix - Resolves OAuth credential files to bearer tokens - Live model discovery from https://api.x.ai/v1/models - Supports acompletion and aembedding - Register xai in provider_factory, credential_manager, credential_tool
…ken counts - Add gemini-3.5-flash to AVAILABLE_MODELS (maps to gemini-3-flash via CCPA) - Fix is_gemini_3_flash check so 3.5-flash uses thinkingLevel (not thinkingBudget) - Merge 25-flash + 3-flash into single "flash" quota group (verified via matching reset timestamps on live quota API) - Unify flash-lite models (2.5 + 3.1) into single "flash-lite" quota group - Add gemini-3.5-flash to DEFAULT_MAX_REQUESTS for PRO/FREE tiers - Filter stale quota groups from usage manager to prevent display of renamed/merged groups in quota UI - Fix streaming token counts: the final SSE chunk carrying usageMetadata often contains only a thoughtSignature (empty text), causing all parts to be skipped and token counts to be lost (reported as 0/0 in logs). Emit a usage-only chunk when no parts yield content but usageMetadata is present.
Add x-ai to the proxy's admin OAuth API so the WebUI Credentials page can onboard xAI Grok accounts via the device-code flow. src/proxy_app/api/oauth.py: - Add 'x-ai' to PROVIDER_META with flow_type=device_code. - Add dispatcher branch in start_oauth_flow for 'x-ai'. - Implement _start_xai_device_flow, _poll_xai_device, and _finalize_xai parallel to the copilot device-code path. Reuses XAI_CLIENT_ID, XAI_OAUTH_SCOPES, XAI_DEVICE_CODE_URL, XAI_TOKEN_URL, and XAI_USERINFO_URL from XAiAuthBase — no constant duplication. - Persists credentials via the existing _save_credential_file helper with the exact shape XAiAuthBase's loader expects (access_token, refresh_token, expiry_date, account_id, _proxy_metadata). No frontend changes. The existing Credentials.tsx dialog already handles flow_type=device_code generically (user_code, verification_uri, polling, copy-to-clipboard) — adding the provider to PROVIDER_META is sufficient to surface it in the 'Add OAuth' dialog. tests/test_xai_oauth_flow.py: 6 test cases covering the providers list, the start envelope (with upstream mock), the unknown-provider regression, the status endpoint, and the credential-file prefix. Un-ignore via .gitignore per the established pattern for tracked test files in this repo.
- Switch git-cliff range from $LAST_TAG..HEAD to upstream/dev..HEAD - Add incremental diff step comparing fork_state markers between releases - Embed state markers in release body for next build consumption - Add upstream sync reference line to release notes - Drop broken tag-hunting logic (~90 lines) that relied on orphaned tags - Add Release Notes subsection and Rule 8 to AGENTS.md
|
Too many files changed for review. ( |
|
Important Review skippedToo many files! This PR contains 194 files, which is 44 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (6)
📒 Files selected for processing (194)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Opened against wrong repo by automation; will recreate in the b3nw fork. |
|
Starting my review of the Umans provider addition — focusing on the quota tracker mixin, provider integration, and test coverage. I'll also scan the broader diff for repo-hygiene issues since it includes the full fork stack. Report coming shortly. |
Description
This PR adds first-class support for the Umans API () to the proxy.
Files Changed
Verification
platform linux -- Python 3.13.5, pytest-9.1.1, pluggy-1.6.0 -- /opt/data/workspace/developer/repos/forks/b3nw/LLM-API-Key-Proxy/worktrees/feat-umans-provider-quota/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /opt/data/workspace/developer/repos/forks/b3nw/LLM-API-Key-Proxy/worktrees/feat-umans-provider-quota
configfile: pyproject.toml
plugins: mock-3.15.1, anyio-4.13.0, asyncio-1.4.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 23 items
tests/test_umans_quota_tracker.py::test_get_credential_identifier_env_path PASSED [ 4%]
tests/test_umans_quota_tracker.py::test_get_credential_identifier_masks_long_key PASSED [ 8%]
tests/test_umans_quota_tracker.py::test_get_credential_identifier_short_key_unmasked PASSED [ 13%]
tests/test_umans_quota_tracker.py::test_parse_iso_to_unix_z PASSED [ 17%]
tests/test_umans_quota_tracker.py::test_detect_plan_code_pro_inferred PASSED [ 21%]
tests/test_umans_quota_tracker.py::test_detect_plan_code_pro_explicit PASSED [ 26%]
tests/test_umans_quota_tracker.py::test_detect_plan_max PASSED [ 30%]
tests/test_umans_quota_tracker.py::test_resolve_request_limit_code_pro PASSED [ 34%]
tests/test_umans_quota_tracker.py::test_resolve_request_limit_code_pro_no_limit PASSED [ 39%]
tests/test_umans_quota_tracker.py::test_resolve_request_limit_max PASSED [ 43%]
tests/test_umans_quota_tracker.py::test_resolve_request_limit_max_ignores_positive_env PASSED [ 47%]
tests/test_umans_quota_tracker.py::test_resolve_request_limit_code_pro_env_override PASSED [ 52%]
tests/test_umans_quota_tracker.py::test_parse_usage_response_code_pro PASSED [ 56%]
tests/test_umans_quota_tracker.py::test_parse_usage_response_max PASSED [ 60%]
tests/test_umans_quota_tracker.py::test_parse_usage_response_code_pro_env_override PASSED [ 65%]
tests/test_umans_quota_tracker.py::test_store_baselines_to_usage_manager PASSED [ 69%]
tests/test_umans_quota_tracker.py::test_store_baselines_skips_request_group_for_max_plan PASSED [ 73%]
tests/test_umans_quota_tracker.py::test_fetch_initial_baselines_mixed PASSED [ 78%]
tests/test_umans_quota_tracker.py::test_provider_get_model_quota_group PASSED [ 82%]
tests/test_umans_quota_tracker.py::test_provider_get_models_success PASSED [ 86%]
tests/test_umans_quota_tracker.py::test_provider_parse_quota_error_rate_limit PASSED [ 91%]
tests/test_umans_quota_tracker.py::test_provider_parse_quota_error_not_quota PASSED [ 95%]
tests/test_umans_quota_tracker.py::test_provider_get_credential_concurrency_limit_from_cache PASSED [100%]
============================== 23 passed in 3.05s ============================== — 23 passed
........................................................................ [ 35%]
........................................................................ [ 53%]
........................................................................ [ 71%]
........................................................................ [ 89%]
.......................................... [100%]
=============================== warnings summary ===============================
tests/test_failure_logger.py: 10 warnings
/opt/data/workspace/developer/repos/forks/b3nw/LLM-API-Key-Proxy/worktrees/feat-umans-provider-quota/src/rotator_library/failure_logger.py:226: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"timestamp": datetime.utcnow().isoformat(),
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
402 passed, 10 warnings in 4.06s — 402 passed
Notes / Follow-ups