LLM Interactive Proxy

Turn any compatible AI client into a safer, smarter, multi-provider agent platform.

LLM Interactive Proxy is a universal translation, routing, and control layer for modern AI clients. Point OpenAI-compatible apps, Anthropic tools, Gemini integrations, and agentic coding workflows at one local or shared endpoint, then gain routing, failover, built-in security, automated steering, session intelligence, observability, and cross-provider flexibility without rewriting your client.

If your current setup feels fragile, expensive, opaque, or locked to one vendor, this project is designed to change that.

It is a compatibility layer, a security layer, a traffic control plane, a debugging surface, and a workflow improver for serious agentic use.

Active Development: This project is continuously evolving with new backends, routing features, and reliability improvements. See the CHANGELOG for the latest additions.

Keep your existing clients - Change the endpoint, not the app.
Mix providers freely - Route across APIs, plans, OAuth accounts, model families, and protocol styles.
Control agents in production - Add guardrails, rewrites, diagnostics, and policy at the proxy layer.
Debug with evidence - Inspect exact wire traffic instead of guessing from symptoms.

Without the proxy	With LLM Interactive Proxy
Each client is tied to one provider stack	One endpoint can serve many clients and many backend families
Provider switching often means code or config churn	Change routing instead of rewriting integrations
Agent safety is scattered across tools	Centralize redaction, tool controls, sandboxing, and command protection
Debugging depends on incomplete logs	Inspect exact wire traffic with captures and diagnostics
Token costs grow with long sessions	Use intelligent context compression and smarter routing to reduce spend
Protocol mismatch blocks experimentation	Use cross-protocol conversion to bridge Anthropic, OpenAI, Gemini, and more

At a glance

Beyond basic forwarding, the proxy adds cross-protocol translation, tool safety, routing and failover, session-oriented features (including B2BUA-style handling), boundary-level CBOR captures, usage tracking, and built-in token-saving controls. Longer narratives, use-case lists, and feature tours live in the User Guide.

One endpoint, many clients - Keep existing OpenAI-, Anthropic-, and Gemini-style clients while changing routing behind the proxy.
Token-saving that actually matters - Shrink bloated sessions with stale-history compaction and content-aware tool-output compression.
Production-minded resilience - Use retries, failover, health tracking, and safeguards that respect streaming semantics.
Operational visibility - Inspect wire captures, diagnostics, and usage data instead of debugging blind.

Token Savings

Long coding sessions tend to waste tokens in two different ways: old tool results remain in history, and fresh tool outputs are often much more verbose than the model needs. The proxy addresses both problems separately.

Context Compaction - Replaces stale historical tool results with explicit stubs once newer results for the same resource exist later in the conversation.
Dynamic Tool Output Compression - Reduces the size of the remaining tool outputs during request preparation using content-aware strategies.
Designed to work together - Compaction removes outdated history first; dynamic compression then reduces the cost of the tool outputs that still matter.
Useful for real agent workloads - Especially helpful for repeated file reads, large grep/search results, verbose test output, logs, diffs, and long debugging sessions.

Start with the Token Saving Guide for the overall picture, then go deeper into Context Compaction and Dynamic Tool Output Compression.

Resilience & Reliability

The proxy includes built-in resilience features for production use:

Smart retry and failover - Automatic recovery from transient backend failures
Circuit breaker - Temporarily excludes unhealthy backends to prevent repeated failures
Streaming protection - Avoids retry after output has started, preventing corruption
Health monitoring - Tracks backend availability and performance

Configure via the resilience section in config.yaml or see the Failure Handling Guide.

Quick Start

1. Clone and install

git clone https://github.com/matdev83/llm-interactive-proxy.git
cd llm-interactive-proxy
python -m venv .venv

# Windows
.venv\Scripts\activate

# Linux/macOS
source .venv/bin/activate

python -m pip install -e .[dev]

If you want OAuth-oriented optional connectors, install the oauth extra:

python -m pip install -e .[dev,oauth]

2. Export at least one provider credential

# Example: OpenAI
export OPENAI_API_KEY="your-key-here"

3. Start the proxy

python -m src.core.cli --default-backend openai:gpt-4o

The proxy listens on http://localhost:8000 by default.

4. Point your client at the proxy instead of the vendor

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

See the full Quick Start Guide for additional setup, auth, and backend examples.

Auto-Append First Prompt

Automatically append text from a file to the first user message in each session. Useful for injecting context, instructions, or system prompts without modifying client code.

Usage:

# config.yaml
auto_append_first_prompt_filename: "./prompts/context.txt"

Or via CLI: --auto-append-first-prompt-file ./prompts/context.txt

The file is loaded once at startup and appended to the first user message of every new session. See Configuration Guide for details.

Supported Frontend Interfaces

The proxy exposes standard API surfaces so existing clients can often work with little or no code changes.

OpenAI Chat Completions - /v1/chat/completions
OpenAI Responses - /v1/responses
OpenAI Models - /v1/models
Anthropic Messages - /anthropic/v1/messages
Dedicated Anthropic server - http://host:<anthropic_port>/v1/messages (only when anthropic_port / --anthropic-port / ANTHROPIC_PORT is set; often 8001)
Google Gemini v1beta - /v1beta/models and :generateContent
Diagnostics endpoint - /v1/diagnostics
Backend reactivation endpoint - /v1/diagnostics/backends/{backend_instance}/reactivate

See Frontend API documentation for protocol details and compatibility notes.

Supported Backends

The backend catalog keeps growing. Current documented backends include:

OpenAI
OpenAI Codex
Anthropic
Google Gemini
OpenRouter
Nvidia
ZAI (Zhipu AI)
Alibaba Qwen
MiniMax
InternLM
ZenMux
Moonshot AI / Kimi Code
Hybrid backend
Cline
Antigravity OAuth See the full Backends Overview for configuration and provider-specific notes.

Routing & Model Selection

The proxy uses a flexible selector syntax for routing requests to backends:

Basic format: backend:model

--default-backend openai:gpt-4o
--default-backend anthropic:claude-3-5-sonnet

Failover chains: Use | to specify fallback backends

--default-backend "openai:gpt-4o|anthropic:claude-3-5-sonnet|openrouter:openai/gpt-4o"

Weighted routing: Use ^ to distribute traffic

--default-backend "[weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet"

When a weighted branch fails before meaningful output starts, runtime recovery can re-roll within the same request by excluding the failed branch and choosing from the remaining weighted leaves.

With parameters: Pass model parameters in the selector

--default-backend "openai:gpt-4o?temperature=0.5&max_tokens=2000"

See the Technical Reference: Routing Selectors for detailed syntax rules and advanced usage.

Access Modes

The proxy supports two operational modes with different security assumptions:

Single User Mode - Default local-development mode with localhost-first behavior and support for OAuth connectors.
Multi User Mode - Shared or production mode with stronger authentication expectations and tighter connector rules.

Quick examples:

# Single User Mode
python -m src.core.cli

# Multi User Mode
python -m src.core.cli --multi-user-mode --host=0.0.0.0 --api-keys key1,key2

See Access Modes for the security model and deployment guidance.

Documentation Map

Quick Start - Get running fast
User Guide - End-user documentation and feature catalog
Configuration Guide - Flags, config, and operational settings
Token Saving Guide - Understand context compaction and dynamic tool-output compression
Frontend Overview - Choose the right API surface
Backends Overview - Provider setup and switching
Security Docs - Authentication and key-handling guidance
Development Guide - Architecture, local development, testing, and contributing
CHANGELOG - Release history
CONTRIBUTING - Contribution guidelines

Development

# Run the test suite
python -m pytest

# Lint and auto-fix
python -m ruff check --fix .

# Format
python -m black .

See the Development Guide for architecture, contribution workflow, and extra dev scripts.

Support

GitHub Issues and Discussions.

License

This project is licensed under the GNU AGPL v3.0 or later.

Name		Name	Last commit message	Last commit date
Latest commit History 2,907 Commits
.factory/commands		.factory/commands
.gemini		.gemini
.github		.github
.kiro		.kiro
.planning		.planning
config		config
dev		dev
docs		docs
examples		examples
scripts		scripts
src		src
stubs		stubs
tests		tests
var		var
.ckignore		.ckignore
.coderabbit.yaml		.coderabbit.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pymarkdown.json		.pymarkdown.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
alembic.example.ini		alembic.example.ini
alembic.ini		alembic.ini
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pyrightconfig.src.json		pyrightconfig.src.json
setup.py		setup.py
vulture_suppressions.ini		vulture_suppressions.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Interactive Proxy

At a glance

Token Savings

Resilience & Reliability

Quick Start

1. Clone and install

2. Export at least one provider credential

3. Start the proxy

4. Point your client at the proxy instead of the vendor

Auto-Append First Prompt

Supported Frontend Interfaces

Supported Backends

Routing & Model Selection

Access Modes

Documentation Map

Development

Support

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Interactive Proxy

At a glance

Token Savings

Resilience & Reliability

Quick Start

1. Clone and install

2. Export at least one provider credential

3. Start the proxy

4. Point your client at the proxy instead of the vendor

Auto-Append First Prompt

Supported Frontend Interfaces

Supported Backends

Routing & Model Selection

Access Modes

Documentation Map

Development

Support

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages