Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
664c616
feat(anthropic): add OAuth support and handle streaming nulls
b3nw Apr 5, 2026
ebe712d
feat(chutes): dollar quota tracking with sliding window
b3nw Apr 5, 2026
d9c9797
feat(codex): Responses API rewrite, dynamic model discovery, and OAut…
b3nw Apr 5, 2026
8bf92a7
feat(opencode_zen): add opencode_zen provider with routing
b3nw Apr 5, 2026
66d56d7
feat(nanogpt): native Anthropic routing, streaming fallback, and quot…
b3nw Apr 11, 2026
9877fd4
fix(gemini-cli): fast-fail on non-rotatable errors and pro quota hand…
b3nw Apr 5, 2026
af05644
feat(vertex): Vertex AI Express Mode provider with x-goog-api-key auth
b3nw Apr 23, 2026
8a05d30
feat(opencode_go): add Opencode Go provider with 3-window quota track…
b3nw Apr 30, 2026
9f81ffa
feat(command_code): add Command Code provider with plan bypass routin…
b3nw Jun 1, 2026
2676959
feat(kilocode): add credit balance tracking via web session cookie
b3nw Jun 8, 2026
32955d4
feat(copilot): GitHub Copilot provider with OAuth device flow, plan-b…
b3nw Apr 15, 2026
4e7ec14
feat(core): infrastructure improvements - latest aliases, error stand…
b3nw Apr 13, 2026
a98d301
feat(health): add health & diagnostics endpoints (/v1/health, /v1/hea…
b3nw Apr 18, 2026
142021b
feat(proxy): outbound HTTP/SOCKS5 proxy support with per-provider/cre…
b3nw May 8, 2026
5bcae1d
feat(usage): add monthly budget and RPD quota guards
b3nw Jun 9, 2026
4ed7076
fix(fallback): enable MODEL_FALLBACK for streaming requests
b3nw Jun 9, 2026
b00a36c
feat(model-routing): MODEL_ALIASES and cross-provider rotation
b3nw Apr 5, 2026
a2483d5
feat(tui): transaction viewer, compact displays, and detail views
b3nw Apr 5, 2026
713386d
feat(tests): add local test suite (153 tests, zero-cost, no network)
b3nw Apr 12, 2026
7168760
feat(tooling): add AGENTS.md and .agent/ config for linear stack work…
b3nw Apr 23, 2026
579c368
feat(webui): add React web UI with admin dashboard, quota viewer, log…
b3nw May 22, 2026
b752022
feat(ci): fork-aware release notes with incremental topic diff
b3nw May 26, 2026
b705a0e
feat(xai): add xAI Grok OAuth provider with PKCE and Device Code flows
b3nw Jun 13, 2026
602e3d4
feat(gemini-cli): expose gemini-3.5-flash, unify quota groups, fix to…
b3nw Jun 18, 2026
afec625
feat(xai): enable xAI Grok device-code OAuth in admin WebUI
b3nw Jun 18, 2026
ded3780
feat(ci): fork-aware release notes with incremental topic diff
b3nw May 26, 2026
0671009
feat(umans): add Umans provider with request-based quota tracking
b3nw Jun 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,13 @@ ENV/

# Build
*.egg-info/
dist/
build/
.eggs/

# Web UI (only package.json, package-lock.json, and src/ are needed)
webui/node_modules
webui/dist

# Logs (will be mounted as volume)
logs/

Expand Down
120 changes: 120 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,33 @@
# Examples:
# QUOTA_GROUPS_GEMINI_CLI_PRO="gemini-2.5-pro,gemini-3-pro-preview"

# --- Model Fallback / Spillover ---
# Configure fallback providers for specific models. When a prefixed request
# (e.g., google/gemma-4-31b-it) exhausts all credentials on the primary
# provider due to scaling issues or errors, the proxy will automatically
# try the listed fallback providers in order.
#
# This does NOT affect unprefixed requests (those use MODEL_ALIAS instead).
# Fallback only triggers on transient provider failures (5xx, rate limits,
# connection errors). Request-level errors (400, 401, 403) are never retried.
#
# Format: MODEL_FALLBACK_<MODEL_NAME>=provider1[:model1],provider2[:model2][|retry_mode]
#
# Model name: dashes → underscores, dots → underscores, uppercased
# gemma-4-31b-it → GEMMA_4_31B_IT
#
# If no :model is specified after a provider, the original model name is used.
#
# Retry mode (appended after |):
# exhaust - (Default) Try all credentials on each fallback provider
# before moving to the next. Gives each provider a full shot.
# round_robin - Try one credential per provider, cycling through.
#
# Examples:
# MODEL_FALLBACK_GEMMA_4_31B_IT="nvidia_nim,ollama_cloud"
# MODEL_FALLBACK_GEMMA_4_31B_IT="nvidia_nim:google/gemma-4-31b-it,ollama_cloud:gemma-4-31b-it|exhaust"
# MODEL_FALLBACK_DEEPSEEK_V3="nvidia_nim,google|round_robin"

# ------------------------------------------------------------------------------
# | [ADVANCED] Fair Cycle Rotation |
# ------------------------------------------------------------------------------
Expand Down Expand Up @@ -369,6 +396,99 @@
# Default: 8085
# GEMINI_CLI_OAUTH_PORT=8085

# ------------------------------------------------------------------------------
# | [CODEX] OpenAI Codex Provider Configuration |
# ------------------------------------------------------------------------------
#
# Codex provider uses OAuth authentication with OpenAI's ChatGPT backend API.
# Credentials are stored in oauth_creds/ directory as codex_oauth_*.json files.
#

# --- Reasoning Effort ---
# Controls how much "thinking" the model does before responding.
# Higher effort = more thorough reasoning but slower responses.
#
# Available levels (model-dependent):
# - low: Minimal reasoning, fastest responses
# - medium: Balanced (default)
# - high: More thorough reasoning
# - xhigh: Maximum reasoning (gpt-5.2, gpt-5.2-codex, gpt-5.3-codex, gpt-5.1-codex-max only)
#
# Can also be controlled per-request via:
# 1. Model suffix: codex/gpt-5.2:high
# 2. Request param: "reasoning_effort": "high"
#
# CODEX_REASONING_EFFORT=medium

# --- Reasoning Summary ---
# Controls how reasoning is summarized in responses.
# Options: auto, concise, detailed, none
# CODEX_REASONING_SUMMARY=auto

# --- Reasoning Output Format ---
# How reasoning/thinking is presented in responses.
# Options:
# - think-tags: Wrap in <think>...</think> tags (default, matches other providers)
# - raw: Include reasoning as-is
# - none: Don't include reasoning in output
# CODEX_REASONING_COMPAT=think-tags

# --- Identity Override ---
# When true, injects an override that tells the model to prioritize
# user-provided system prompts over the required opencode instructions.
# CODEX_INJECT_IDENTITY_OVERRIDE=true

# --- Instruction Injection ---
# When true, injects the required opencode system instruction.
# Only disable if you know what you're doing (API may reject requests).
# CODEX_INJECT_INSTRUCTION=true

# --- Empty Response Handling ---
# Number of retry attempts when receiving empty responses.
# CODEX_EMPTY_RESPONSE_ATTEMPTS=3

# Delay (seconds) between empty response retries.
# CODEX_EMPTY_RESPONSE_RETRY_DELAY=2

# --- OAuth Configuration ---
# OAuth callback port for Codex interactive authentication.
# Default: 8086
# CODEX_OAUTH_PORT=8086


# --- GitHub Copilot ---
# GitHub Copilot provider uses Device Flow OAuth.
# The GitHub OAuth token (long-lived) is used to derive short-lived
# Copilot API tokens (~30 min expiry, refreshed automatically).
#
# Numbered credential format (recommended for multiple accounts):
# COPILOT_1_GITHUB_TOKEN=gho_xxxxx (first GitHub account)
# COPILOT_2_GITHUB_TOKEN=gho_yyyyy (second GitHub account)
#
# Legacy single-credential format:
# COPILOT_GITHUB_TOKEN=gho_xxxxx
#
# Optional: override the default model list
# COPILOT_MODELS=gpt-4o,claude-sonnet-4,gemini-2.5-pro
#
# To obtain a GitHub OAuth token, run the proxy with --add-credential
# and select the Copilot provider, or use the interactive Device Flow
# by starting the proxy without any COPILOT env vars.

# --- KiloCode ---
# KiloCode is configured as a custom OpenAI-compatible provider.
# API key and base URL follow the standard pattern:
# KILO_API_BASE=https://api.kilo.ai/api/openrouter/
# KILO_API_KEY_1="your-kilo-api-key"
#
# Optional: credit balance monitoring via the Kilo web dashboard.
# Obtain this value from the browser cookie __Secure-next-auth.session-token
# after logging in to https://app.kilo.ai/profile
# The token auto-refreshes (~30-day TTL) and the proxy keeps it alive.
# If absent or expired, requests still work — quota simply shows as unknown.
#KILO_SESSION_TOKEN=""
#KILO_QUOTA_REFRESH_INTERVAL=600

# ------------------------------------------------------------------------------
# | [ADVANCED] Debugging / Logging |
# ------------------------------------------------------------------------------
Expand Down
172 changes: 172 additions & 0 deletions .fork/check-stack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
#!/usr/bin/env python3
"""Validate the LLM-API-Key-Proxy fork stack metadata.

This script intentionally uses only the Python standard library so it can run in
fresh workspaces without installing project dependencies.
"""

from __future__ import annotations

import re
import subprocess
import sys
from pathlib import Path

ROOT = Path(__file__).resolve().parents[1]
STACK = ROOT / ".fork" / "stack.yml"
FEATURES = ROOT / ".fork" / "features"
AGENTS = ROOT / "AGENTS.md"

SUBJECT_RE = re.compile(r"^\s*subject:\s+\"(?P<subject>.+)\"\s*$")
ID_RE = re.compile(r"^\s*- id:\s+(?P<id>[A-Za-z0-9_.-]+)\s*$")
DUP_FEATURE_RE = re.compile(r"^\s{4}(?P<feature>[A-Za-z0-9_.-]+):\s*$")
DUP_SUBJECT_RE = re.compile(r"^\s{6}-\s+\"(?P<subject>.+)\"\s*$")
PREFIX_RE = re.compile(r"^(?P<kind>feat|fix)\((?P<feature>[^)]+)\):")


def git(*args: str) -> str:
return subprocess.check_output(["git", *args], cwd=ROOT, text=True)


def parse_manifest() -> tuple[dict[str, str], dict[str, str], dict[str, set[str]]]:
text = STACK.read_text()
ids: dict[str, str] = {}
subjects: dict[str, str] = {}
allowed_duplicates: dict[str, set[str]] = {}
current_id: str | None = None
in_allowed = False
current_allowed: str | None = None

for line in text.splitlines():
if line.strip() == "allowed_duplicate_features:":
in_allowed = True
current_allowed = None
continue
if line.startswith("features:"):
in_allowed = False
current_allowed = None
continue
if in_allowed:
m = DUP_FEATURE_RE.match(line)
if m:
current_allowed = m.group("feature")
allowed_duplicates.setdefault(current_allowed, set())
continue
m = DUP_SUBJECT_RE.match(line)
if m and current_allowed is not None:
allowed_duplicates.setdefault(current_allowed, set()).add(m.group("subject"))
continue

m = ID_RE.match(line)
if m:
current_id = m.group("id")
ids[current_id] = ""
continue
m = SUBJECT_RE.match(line)
if m and current_id is not None:
subjects[m.group("subject")] = current_id
ids[current_id] = m.group("subject")
current_id = None

return ids, subjects, allowed_duplicates


def stack_subjects() -> list[str]:
output = git("log", "--format=%s", "--reverse", "upstream/dev..HEAD")
return [line for line in output.splitlines() if line]


def check_agents(errors: list[str]) -> None:
text = AGENTS.read_text()
release_notes = sum(1 for line in text.splitlines() if line.strip() == "### Release Notes")
if release_notes != 1:
errors.append(f"AGENTS.md must contain exactly one '### Release Notes' heading (found {release_notes})")
if text.count("```") % 2:
errors.append("AGENTS.md has unbalanced fenced code blocks")
if any(line.strip() == "git add -A" for line in text.splitlines()):
errors.append("AGENTS.md contains an executable `git add -A` example")
for marker in ("<<<<<<<", ">>>>>>>"):
if marker in text:
errors.append(f"AGENTS.md contains conflict marker {marker}")
if ".fork/features" not in text:
errors.append("AGENTS.md must document .fork/features as canonical feature history")
if "local workspace state" not in text.lower():
errors.append("AGENTS.md must state that local workspace state is non-canonical")


def check_stack(errors: list[str]) -> None:
ids, manifest_subjects, allowed_duplicates = parse_manifest()
subjects = stack_subjects()
stack_set = set(subjects)

for subject in manifest_subjects:
if subject not in stack_set:
errors.append(f"manifest subject not found in stack: {subject}")

for subject in subjects:
if subject not in manifest_subjects:
m = PREFIX_RE.match(subject)
if not m:
errors.append(f"stack commit lacks known manifest subject and feature prefix: {subject}")
continue
feature = m.group("feature")
allowed = allowed_duplicates.get(feature, set())
if subject not in allowed:
errors.append(f"stack commit is not in manifest or allowed exceptions: {subject}")

by_feature: dict[str, list[str]] = {}
for subject in subjects:
m = PREFIX_RE.match(subject)
if not m:
continue
by_feature.setdefault(m.group("feature"), []).append(subject)

for feature, feature_subjects in sorted(by_feature.items()):
if len(feature_subjects) <= 1:
continue
allowed = allowed_duplicates.get(feature, set())
unexpected = [s for s in feature_subjects if s not in allowed]
manifest_for_feature = [s for s, fid in manifest_subjects.items() if fid == feature]
# Multiple commits are allowed only when every commit is either the canonical
# manifest subject for that feature or an explicitly documented exception.
permitted = set(allowed) | set(manifest_for_feature)
if any(s not in permitted for s in feature_subjects):
errors.append(f"feature {feature!r} has unexpected duplicate stack commits: {feature_subjects}")

for feature_id in ids:
feature_file = FEATURES / f"{feature_id}.md"
if not feature_file.exists():
# Only require detailed histories for features that have a feature file
# once they change under the new workflow. Keep stack-wide adoption
# incremental instead of forcing 20+ stub docs on day one.
continue
text = feature_file.read_text()
subject = ids[feature_id]
if subject and subject not in text:
errors.append(f"{feature_file} does not mention its stack subject")


def main() -> int:
errors: list[str] = []
if not STACK.exists():
errors.append("missing .fork/stack.yml")
if not FEATURES.exists():
errors.append("missing .fork/features/")
if not AGENTS.exists():
errors.append("missing AGENTS.md")
if not errors:
check_agents(errors)
check_stack(errors)

if errors:
print("fork stack validation failed:", file=sys.stderr)
for err in errors:
print(f"- {err}", file=sys.stderr)
return 1

print("fork stack validation passed")
return 0


if __name__ == "__main__":
raise SystemExit(main())
51 changes: 51 additions & 0 deletions .fork/features/ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
## 2026-06-21 — Fix release job failing when short SHA length differs between runners

Target: `feat(ci): fork-aware release notes with incremental topic diff` (`ea5f239`)

Files:
- `.github/workflows/build.yml`

Working commit before autosquash:
- TBD — created via `fixup! feat(ci): ...`

Final stack commit after autosquash:
- TBD — folded into `feat(ci): ...`

### Why

Run 27859339250 / job 82452676947 failed in **Generate Build Metadata**
with `find: 'release-assets': No such file or directory`.

Root cause: `git rev-parse --short HEAD` returns the minimum length
needed for SHA uniqueness in the local object DB — and that length is
not deterministic across runners. For run 27859339250 the build jobs
uploaded artifacts named `proxy-app-build-{Linux,macOS,Windows}-afec625`
(7 chars) while the release job filtered with
`proxy-app-build-*-afec6255` (8 chars). Zero artifacts matched, the
download step exited 0 anyway, and the next bash step (set `-e -o pipefail`)
crashed on the missing directory.

### Fix

1. Pin both `Get short SHA` steps (build job and release job) to
`git rev-parse --short=7 HEAD` so they always agree.
2. Add a defensive `Verify downloaded artifacts` step right after the
download that fails with a clear error and lists the available
artifacts when the download silently matched zero items.

### Verification

- `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/build.yml'))"` — OK
- 7-char SHA matches the length already in use for artifact names, so
no re-upload of historical artifacts is required.
- Recommended: re-run the failed workflow after the fix is folded into
the `feat(ci)` stack commit and pushed.

### Notes / risks

- A fully-orthogonal future fix is to pin everything to the full
40-char SHA — that decouples the artifact name from git's notion of
"short" entirely.
- Another option is to drop `pattern:` on `download-artifact@v4` and
filter by an explicit list (artifact IDs or full names) — `pattern:`
glob matching across multi-runner SHA lengths is a recurring foot-gun.
Loading
Loading