From e1fe9b60dd6b3381833094c4e76a134464562ce3 Mon Sep 17 00:00:00 2001
From: AmitChaubey <amit.katyayana@gmail.com>
Date: Tue, 16 Jun 2026 20:37:28 +0100
Subject: [PATCH] docs(byo-openai): add self-hosted vLLM behind
 OpenAI-compatible gateway Document the kagent -> gateway -> vLLM pattern,
 including the required --enable-auto-tool-choice / --tool-call-parser vLLM
 flags, gateway model identifier usage, and a status 400 troubleshooting note.

Signed-off-by: AmitChaubey <amit.katyayana@gmail.com>
---
 .../supported-providers/byo-openai/page.mdx   | 39 +++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/src/app/docs/kagent/supported-providers/byo-openai/page.mdx b/src/app/docs/kagent/supported-providers/byo-openai/page.mdx
index 9ce229a2..6e68c23e 100644
--- a/src/app/docs/kagent/supported-providers/byo-openai/page.mdx
+++ b/src/app/docs/kagent/supported-providers/byo-openai/page.mdx
@@ -58,6 +58,45 @@ You can bring your own model from an [OpenAI API-compatible](https://platform.op
 
 Good job! You added a model to kagent. Next, you can [create or update an agent](https://kagent.dev/docs/kagent/getting-started/first-agent) to use this model.
 
+## Self-hosted vLLM behind an OpenAI-compatible gateway
+
+A common self-hosted pattern places an OpenAI-compatible gateway such as [Bifrost](https://github.com/maximhq/bifrost) or [LiteLLM](https://docs.litellm.ai/) in front of a [vLLM](https://docs.vllm.ai/) server (kagent → gateway → vLLM). Configure it as an OpenAI-compatible provider exactly as above, with two extra things to get right.
+
+### Enable tool calling in vLLM
+
+kagent sends a `tools` array with `tool_choice: "auto"` on every request. Its runtime registers a built-in `ask_user` tool on every agent, so this happens even when you configure no tools yourself. The vLLM backend **must** be launched with automatic tool choice enabled, or every agent turn fails.
+
+```bash
+vllm serve Qwen/Qwen2.5-7B-Instruct \
+  --enable-auto-tool-choice \
+  --tool-call-parser hermes
+```
+
+The correct `--tool-call-parser` depends on your model family. See the [vLLM tool calling docs](https://docs.vllm.ai/en/latest/features/tool_calling.html) for the current list. At the time of writing, Qwen2.5 uses `hermes` and Llama 3.1 uses `llama3_json`, but parser names change across vLLM releases, so check the docs for your model.
+
+### Use the gateway's model identifier
+
+Set `spec.model` to the identifier your gateway routes on (often provider-prefixed, such as `vllm/Qwen/Qwen2.5-7B-Instruct`), which can differ from the bare model name vLLM serves internally. Point `openAI.baseUrl` at the gateway (LiteLLM defaults to port `4000`, Bifrost to `8080`).
+
+```yaml
+apiVersion: kagent.dev/v1alpha2
+kind: ModelConfig
+metadata:
+  name: qwen-vllm-via-gateway
+  namespace: kagent
+spec:
+  apiKeySecret: kagent-my-provider
+  apiKeySecretKey: ${PROVIDER_API_KEY}
+  provider: OpenAI
+  model: vllm/Qwen/Qwen2.5-7B-Instruct
+  openAI:
+    baseUrl: http://litellm.kagent.svc.cluster.local:4000/v1
+```
+
+### Troubleshooting: `provider API error (status 400)`
+
+If every agent message fails with a generic `status 400`, vLLM was almost certainly started without `--enable-auto-tool-choice` and a matching `--tool-call-parser`. Because kagent always sends `tool_choice: "auto"`, vLLM rejects the request shape until automatic tool choice is enabled. Restart vLLM with the flags above and retry.
+
 ## TLS Configuration
 
 To secure communication to LLMs with your own custom certificates, configure the TLS CA details in the `ModelConfig`. Then, your agents communicate with the LLM with those custom certificates. This feature is useful for internal or company-managed LLM servers.