From b58f9ef64f124a34bece7b1f27d3c5da2b0e17a1 Mon Sep 17 00:00:00 2001 From: Alejandro Ponce Date: Wed, 15 Apr 2026 17:41:07 +0300 Subject: [PATCH 1/4] Update MCP Optimizer tutorial to use K8s/vMCP path Replace the deprecated Python-based standalone optimizer tutorial with a Kubernetes-first walkthrough using VirtualMCPServer and EmbeddingServer CRDs. Update cross-references in the vMCP optimizer guide accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/toolhive/guides-vmcp/optimizer.mdx | 7 +- docs/toolhive/tutorials/mcp-optimizer.mdx | 455 +++++++++++++--------- 2 files changed, 277 insertions(+), 185 deletions(-) diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx index aa24b63e..1b1a9000 100644 --- a/docs/toolhive/guides-vmcp/optimizer.mdx +++ b/docs/toolhive/guides-vmcp/optimizer.mdx @@ -10,9 +10,9 @@ number of tools exposed to clients can grow quickly. The optimizer addresses this by filtering tools per request, reducing token usage and improving tool selection accuracy. -For the desktop/CLI approach using the MCP Optimizer container, see the +For a step-by-step tutorial that walks through the full setup, see the [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx). This guide covers the -Kubernetes operator approach using VirtualMCPServer and EmbeddingServer CRDs. +configuration details for the VirtualMCPServer and EmbeddingServer CRDs. ## Benefits @@ -314,7 +314,8 @@ spec: ## Related information -- [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx) - desktop/CLI setup +- [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx) - end-to-end + Kubernetes setup - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on tool filtering and context pollution - [Configure vMCP servers](./configuration.mdx) diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx index d0ebaaad..f8400c83 100644 --- a/docs/toolhive/tutorials/mcp-optimizer.mdx +++ b/docs/toolhive/tutorials/mcp-optimizer.mdx @@ -1,54 +1,57 @@ --- title: Reduce token usage with MCP Optimizer description: - Enable the MCP Optimizer to enhance tool selection and reduce token usage. + Deploy the MCP Optimizer on Kubernetes with Virtual MCP Server and an + EmbeddingServer to filter tools and reduce token usage. schema_type: tutorial --- ## Overview -The ToolHive MCP Optimizer acts as an intelligent intermediary between AI -clients and MCP servers. It provides tool discovery, unified access to multiple -MCP servers through a single endpoint, and intelligent routing of requests to -appropriate MCP tools. +The MCP Optimizer acts as an intelligent intermediary between AI clients and MCP +servers. It provides tool discovery, unified access to multiple MCP servers +through a single endpoint, and intelligent routing of requests to appropriate +MCP tools. :::note[Moving to vMCP] The optimizer is now integrated into [Virtual MCP Server (vMCP)](../guides-vmcp/optimizer.mdx), which provides the same tool filtering and token reduction at the team level. You can deploy it in -Kubernetes today, and a local experience is coming soon. This tutorial covers -the standalone CLI approach in the meantime. +Kubernetes today, and a local experience is coming soon. This tutorial walks you +through the Kubernetes deployment. ::: -## About MCP Optimizer +In this tutorial, you deploy the optimizer on Kubernetes using Virtual MCP +Server (vMCP) and an EmbeddingServer for semantic tool search. -### Benefits +## What you'll learn -- **Reduced token usage**: Narrow down the toolset to only relevant tools for a - given task, minimizing context overload and token consumption -- **Improved tool selection**: Find the most appropriate tools across all - connected MCP servers -- **Simplified client configuration**: Connect to a single MCP Optimizer - endpoint instead of managing multiple MCP server connections +- How to create an MCPGroup with multiple backend MCP servers +- How to deploy an EmbeddingServer for semantic search +- How to create a VirtualMCPServer with the optimizer enabled +- How to connect your AI client to the optimized endpoint +- How to verify the optimizer reduces the visible toolset to `find_tool` and + `call_tool` -### How it works +## About MCP Optimizer -Instead of flooding the model with all available tools, MCP Optimizer introduces -two lightweight primitives: +Instead of exposing every backend tool to the model, the optimizer introduces +two lightweight primitives: `find_tool` for semantic search and `call_tool` for +routing. This keeps context small and improves tool selection accuracy. For the +full parameter reference and configuration options, see +[Optimize tool discovery](../guides-vmcp/optimizer.mdx). -1. `find_tool`: Searches for the most relevant tools using hybrid semantic + - keyword search -2. `call_tool`: Routes the selected tool request to the appropriate MCP server +### How it works The workflow is as follows: -1. You send a prompt that requires tool assistance (for example, interacting - with a GitHub repo) +1. You send a prompt that requires tool assistance (for example, fetching a web + page) 2. The assistant calls `find_tool` with relevant keywords extracted from the prompt -3. MCP Optimizer returns the most relevant tools (up to 8 by default, but this +3. The optimizer returns the most relevant tools (up to 8 by default, but this is configurable) 4. Only those tools and their descriptions are included in the context sent to the model @@ -56,255 +59,343 @@ The workflow is as follows: ```mermaid flowchart TB - subgraph optimizerGroup["MCP Optimizer group (internal)"] + subgraph vmcpGroup["VirtualMCPServer"] direction TB - optimizer["MCP Optimizer"] + vmcp["vMCP (optimizer enabled)"] end - subgraph target["ToolHive group: default"] + subgraph embedding["EmbeddingServer"] + direction TB + tei["Text Embeddings Inference"] + end + subgraph backends["MCPGroup backends"] direction TB mcp1["MCP server"] mcp2["MCP server"] mcp3["MCP server"] end - client(["Client"]) <-- connects --> optimizerGroup - optimizer <-. discovers/routes .-> target + client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup + vmcp <-. "semantic search" .-> embedding + vmcp <-. "discovers / routes" .-> backends ``` ## Prerequisites -- One of the following container runtimes: - - macOS: Docker Desktop, Podman Desktop, or Rancher Desktop (using dockerd) - - Windows: Docker Desktop or Rancher Desktop (using dockerd) - - Linux: any container runtime (see [Linux setup](#linux-setup)) -- ToolHive CLI +Before starting this tutorial, make sure you have: + +- A Kubernetes cluster with the ToolHive operator installed (see + [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx)) +- `kubectl` configured to communicate with your cluster +- The [ToolHive CLI](../guides-cli/quickstart.mdx) installed on your local + machine (used in Step 4 to register the endpoint with your MCP clients) +- An MCP client (Visual Studio Code with GitHub Copilot is used in this + tutorial) -## Step 1: Install MCP servers in a ToolHive group +:::warning[ARM64 compatibility] -Before you can use MCP Optimizer, you need to have one or more MCP servers -running in a ToolHive group. If you don't have any MCP servers set up yet, -follow these steps: +The default text embeddings inference (TEI) images depend on Intel MKL, which is +x86_64-only. If you are using Apple Silicon or any other ARM64 node (including +kind on macOS), you need to pre-pull the amd64 image before proceeding. See +[ARM64 compatibility](../guides-vmcp/optimizer.mdx#arm64-compatibility) for the +workaround steps. -Run one or more MCP servers in the `default` group. For this tutorial, you can -run the following example MCP servers: +::: -- `github`: Provides tools for interacting with GitHub repositories - ([guide](../guides-mcp/github.mdx?mode=cli)) -- `fetch`: Provides a web search tool to fetch recent news articles -- `time`: Provides a tool to get the current time in various time zones +## Step 1: Create an MCPGroup and deploy backend MCP servers -```bash -thv run github -thv run fetch -thv run time -``` +Create an MCPGroup to organize the backend MCP servers that the optimizer will +index and route to: -See the [Run MCP servers](../guides-cli/run-mcp-servers.mdx) guide for more -details. +```yaml title="mcpgroup.yaml" +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPGroup +metadata: + name: optimizer-demo + namespace: toolhive-system +spec: + description: Backend servers for the optimizer tutorial +``` -Verify the MCP servers are running: +Apply the resource: ```bash -thv list +kubectl apply -f mcpgroup.yaml ``` -## Step 2: Connect your AI client +Next, deploy two MCP servers in the group. Both reference `optimizer-demo` in +the `groupRef` field: + +```yaml {11,30} title="mcpservers.yaml" +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServer +metadata: + name: fetch + namespace: toolhive-system +spec: + image: ghcr.io/stackloklabs/gofetch/server + transport: streamable-http + proxyPort: 8080 + mcpPort: 8080 + groupRef: optimizer-demo + resources: + limits: + cpu: '100m' + memory: '128Mi' + requests: + cpu: '50m' + memory: '64Mi' +--- +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServer +metadata: + name: osv + namespace: toolhive-system +spec: + image: ghcr.io/stackloklabs/osv-mcp/server + transport: streamable-http + proxyPort: 8080 + mcpPort: 8080 + groupRef: optimizer-demo + resources: + limits: + cpu: '100m' + memory: '128Mi' + requests: + cpu: '50m' + memory: '64Mi' +``` -Connect your AI client to the ToolHive group where the MCP servers are running -(for example, the `default` group). +Apply the resources and wait for both servers to be ready: -:::note +```bash +kubectl apply -f mcpservers.yaml +kubectl get mcpservers -n toolhive-system -w +``` -For best results, connect your client to only the optimized group. If you -connect it to multiple groups, ensure there is no overlap in MCP servers between -the groups to avoid unpredictable behavior. +You should see both servers with `Ready` status before continuing. -::: +:::note -Run the following command to register your AI client with the ToolHive group -where the MCP servers are running (for example, `default`): +If you still have an MCPServer left over from the +[K8s Operator Quickstart](../guides-k8s/quickstart.mdx), you can delete it first +to avoid confusion: ```bash -thv client setup +kubectl delete mcpserver fetch -n toolhive-system ``` -See the [Client configuration](../guides-cli/client-configuration.mdx) guide for -more details. +Then apply the YAML above, which creates a new `fetch` server with the correct +`groupRef`. -Open your AI client and verify that it is connected to the correct MCP servers. -If you installed the `github`, `fetch`, and `time` servers, you should see -almost 50 tools available. - -## Step 3: Enable MCP Optimizer +::: -If you are on Linux with native containers, follow the steps below but see -[Linux setup](#linux-setup) for the modified `thv run` command. +## Step 2: Deploy an EmbeddingServer -**Step 3.1: Run the API server** +The optimizer uses semantic search to find relevant tools. This requires an +EmbeddingServer, which runs a text embeddings inference (TEI) server. -MCP Optimizer uses the ToolHive API server to discover MCP servers and manage -client connections. +Create an EmbeddingServer with default settings. This deploys the +`BAAI/bge-small-en-v1.5` model: -You can run the API server in two ways. The simplest is to install and run the -ToolHive UI, which automatically starts the API server in the background. +```yaml title="embedding-server.yaml" +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: EmbeddingServer +metadata: + name: optimizer-embedding + namespace: toolhive-system +spec: {} +``` -If you prefer to run the API server manually using the CLI, open a dedicated -terminal window and start it on a specific port: +Apply the resource: ```bash -thv serve --port 50100 +kubectl apply -f embedding-server.yaml ``` -Note the port number (`50100` in this example) for use in the next step. - -**Step 3.2: Create a dedicated group and run mcp-optimizer** +Wait for the EmbeddingServer to reach the `Ready` phase before proceeding. The +first startup may take a few minutes while the model downloads: ```bash -# Create the meta group -thv group create optimizer - -# Run mcp-optimizer in the dedicated group -thv run --group optimizer -e TOOLHIVE_PORT=50100 mcp-optimizer +kubectl get embeddingserver optimizer-embedding -n toolhive-system -w ``` -If you are running the API server using the ToolHive UI, omit the -`TOOLHIVE_PORT` environment variable. +:::info[What's happening?] + +The EmbeddingServer deploys a TEI container that generates vector embeddings +from text. The optimizer uses these embeddings to perform semantic search across +all backend tools, finding the most relevant tools for a given query even when +the exact keywords don't match. -**Step 3.3: Configure your AI client for the meta group** +::: + +## Step 3: Create a VirtualMCPServer with the optimizer + +Create a VirtualMCPServer that aggregates the backend servers and enables the +optimizer. Adding `embeddingServerRef` is the only change needed to enable the +optimizer - sensible defaults are applied automatically: + +```yaml {8-9} title="virtualmcpserver.yaml" +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: VirtualMCPServer +metadata: + name: optimizer-vmcp + namespace: toolhive-system +spec: + # highlight-start + embeddingServerRef: + name: optimizer-embedding + # highlight-end + incomingAuth: + type: anonymous + serviceType: ClusterIP + config: + groupRef: optimizer-demo + aggregation: + conflictResolution: prefix + conflictResolutionConfig: + prefixFormat: '{workload}_' +``` -Remove your client from the `default` group. For example, to unregister Cursor: +Apply the resource: ```bash -thv client remove cursor --group default +kubectl apply -f virtualmcpserver.yaml ``` -Then, register your client with the `optimizer` group: +Check the status: ```bash -# Run the group setup, select the optimizer group, and then select your client -thv client setup +kubectl get virtualmcpservers -n toolhive-system +``` + +After about 30 seconds, you should see output similar to: -# Verify the configuration -thv client list-registered +```text +NAME PHASE URL BACKENDS AGE READY +optimizer-vmcp Ready http://vmcp-optimizer-vmcp.toolhive-system.svc.cluster.local:4483 2 30s True ``` -:::note +:::info[What's happening?] -Your client now connects only to the `optimizer` group and sees only the -`mcp-optimizer` MCP server. +Setting `embeddingServerRef` tells the operator to enable the optimizer on this +VirtualMCPServer. Instead of exposing all backend tools directly, the optimizer +builds a semantic index of tools and exposes only `find_tool` and `call_tool` to +clients. This dramatically reduces the number of tools (and tokens) sent to the +model. ::: -The resulting configuration should look like this: +## Step 4: Connect your AI client -```mermaid -flowchart TB - subgraph meta["ToolHive group: optimizer"] - direction TB - optimizer["mcp-optimizer"] - end - subgraph def["ToolHive group: default"] - direction TB - mcp1["github"] - mcp2["fetch"] - mcp3["time"] - end +The vMCP service runs inside Kubernetes and is not directly reachable by desktop +AI clients. This tutorial uses `kubectl port-forward` because it works with any +cluster, but in production you would typically expose the service through an +Ingress, Gateway API, or LoadBalancer. See +[Expose the service](../guides-vmcp/configuration.mdx#expose-the-service) for +the available options. - client(["Client"]) <-- connects --> meta - optimizer <-. discovers/routes .-> def - client x-. 🚫 .-x def +In a separate terminal, port-forward the vMCP service to your local machine: + +```bash +kubectl port-forward service/vmcp-optimizer-vmcp -n toolhive-system 4483:4483 ``` -## Step 4: Sample prompts +Test the health endpoint: -After you configure and run MCP Optimizer, you can use the same prompts you -would normally use with individual MCP servers. The Optimizer automatically -discovers and routes to appropriate tools. +```bash +curl http://localhost:4483/health +``` -Using the example MCP servers above, here are some sample prompts: +You should see `{"status":"ok"}`. -- "Get the details of GitHub issue 1911 from the stacklok/toolhive repo" -- "List recent PRs from the stacklok/toolhive repo" -- "Fetch the latest news articles about AI" -- "What is the current time in Tokyo?" +The ToolHive CLI bridges the remaining gap: it registers the port-forwarded +endpoint as a local workload and automatically updates your MCP client +configuration to point at it. -Watch how MCP Optimizer intelligently selects and routes to the relevant tools -across the connected MCP servers, reducing token usage and improving response -quality. +Register the port-forwarded vMCP endpoint as a ToolHive-managed workload: -To check your token savings, you can ask the optimizer: +```bash +thv run http://localhost:4483/mcp --name optimizer-vmcp +``` -- "How many tokens did I save using MCP Optimizer?" +:::tip -## Linux setup +If you haven't set up client configuration yet, run `thv client setup` to +register your MCP clients. See +[Client configuration](../guides-cli/client-configuration.mdx) for more details. -The setup depends on which type of container runtime you are using. +::: -### VM-based container runtimes +Open your AI client and check its MCP configuration. You should see only two +tools available: `find_tool` and `call_tool`. This confirms the optimizer is +working. -If you are using a container runtime that runs containers inside a virtual -machine (such as Docker Desktop for Linux), the setup is the same as on macOS -and Windows. No additional configuration is needed - follow the steps above. +## Step 5: Test the optimizer -### Native containers (Docker, Podman, Rancher Desktop, and others) +Try these sample prompts to verify the optimizer is routing requests correctly +across both backend MCP servers: -:::note +- "Fetch the contents of https://docs.stacklok.com and summarize the page" +- "Check if the Go package github.com/stacklok/toolhive has any known + vulnerabilities" -Before running the command below, complete the following: +Watch how the optimizer uses `find_tool` to locate the relevant tool across all +backends, then `call_tool` to execute it - all through a single endpoint. -1. [Step 1](#step-1-install-mcp-servers-in-a-toolhive-group) - install your MCP - servers -2. [Step 2](#step-2-connect-your-ai-client) - connect your AI client -3. [Step 3](#step-3-enable-mcp-optimizer) - start the API server, create the - optimizer group, and reconfigure your client. When you reach the - `thv run mcp-optimizer` command, use the Linux-specific command below - instead. +To check your token savings, send this prompt to your AI client: -::: +- "How many tokens did I save using MCP Optimizer?" + +## Clean up -Most Linux container runtimes run containers natively on the host kernel. -Because containers run directly on the host kernel, `host.docker.internal` is -not automatically configured - unlike on macOS and Windows, where Docker Desktop -sets it up to let containers reach the host from inside a virtual machine. -Instead, you need to pass a couple of extra flags: +Delete the resources when you're done: ```bash -# Run mcp-optimizer with host networking -thv run --group optimizer --network host \ - -e TOOLHIVE_HOST=127.0.0.1 \ - -e ALLOWED_GROUPS=default \ - mcp-optimizer +kubectl delete virtualmcpserver optimizer-vmcp -n toolhive-system +kubectl delete embeddingserver optimizer-embedding -n toolhive-system +kubectl delete mcpserver fetch osv -n toolhive-system +kubectl delete mcpgroup optimizer-demo -n toolhive-system ``` -- `--network host` lets the container reach the host directly, achieving the - same result as the automatic bridge Docker Desktop sets up on macOS and - Windows. -- `TOOLHIVE_PORT` specifies the port the API server is listening on. If you - started it manually with a custom port in Step 3.1, pass - `-e TOOLHIVE_PORT=` here as well. Omit it if you are using the ToolHive - UI to run the API server. -- `TOOLHIVE_HOST` tells `mcp-optimizer` to connect to `127.0.0.1` instead of - `host.docker.internal`. -- `ALLOWED_GROUPS` tells the optimizer which group's MCP servers to discover, - index, and route requests to. Replace `default` with the name of the group you - want to optimize. +To tear down the entire kind cluster from the K8s Quickstart: -To change which groups MCP Optimizer can optimize after initial setup, remove -the workload and run the command again with the updated `ALLOWED_GROUPS` value -(see [Remove a server](../guides-cli/manage-mcp-servers.mdx#remove-a-server)). +```bash +kind delete cluster --name toolhive +``` + +:::note[Legacy: standalone MCP Optimizer] + +The standalone `mcp-optimizer` container can also run alongside ToolHive on +desktop (macOS, Windows, Linux) without Kubernetes. This approach is being +replaced by the Kubernetes-based optimizer described above. For the standalone +setup, see the [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx). + +::: -See [Step 4: Sample prompts](#step-4-sample-prompts) to verify the setup. +## Next steps -## What's next? +Now that you've set up the MCP Optimizer, consider exploring these next steps: -- Experiment with different MCP servers to see how MCP Optimizer enhances tool - selection and reduces token usage -- Explore the [vMCP optimizer](../guides-vmcp/optimizer.mdx) for team-level - optimization in Kubernetes +- [Tune the optimizer](../guides-vmcp/optimizer.mdx#tune-the-optimizer) to + adjust search parameters for your workload +- [Configure authentication](../guides-vmcp/authentication.mdx) for production + deployments +- [Monitor vMCP activity](../guides-vmcp/telemetry-and-metrics.mdx) with + OpenTelemetry tracing and metrics +- [Configure failure handling](../guides-vmcp/failure-handling.mdx) for circuit + breakers and partial failure modes +- Provide feedback on your experience on the + [Stacklok Discord community](https://discord.gg/stacklok) ## Related information -- [Optimize tool discovery in vMCP](../guides-vmcp/optimizer.mdx) - Kubernetes - operator approach +- [Optimize tool discovery](../guides-vmcp/optimizer.mdx) - full parameter + reference, high availability, and ARM64 workaround details - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on tool filtering and context pollution +- [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of + vMCP +- [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx) - standalone desktop + approach (legacy) +- [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx) - prerequisite + tutorial From a673ec16464fc09f0f5873c941d1dd23fd0b0ecb Mon Sep 17 00:00:00 2001 From: Alejandro Ponce Date: Thu, 16 Apr 2026 11:35:42 +0300 Subject: [PATCH 2/4] Address PR review feedback for optimizer tutorial Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/toolhive/guides-vmcp/optimizer.mdx | 22 +++++++----- docs/toolhive/tutorials/mcp-optimizer.mdx | 44 ++++++++++------------- 2 files changed, 33 insertions(+), 33 deletions(-) diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx index 1b1a9000..42110407 100644 --- a/docs/toolhive/guides-vmcp/optimizer.mdx +++ b/docs/toolhive/guides-vmcp/optimizer.mdx @@ -148,11 +148,21 @@ For the complete field reference, see the :::warning[ARM64 compatibility] -The default TEI CPU images depend on Intel MKL, which is x86_64-only. No -official ARM64 images exist yet. On ARM64 nodes (including Apple Silicon with -kind), you can run the amd64 image under emulation as a workaround. +The default TEI CPU images depend on Intel MKL, which is x86_64-only. Native +ARM64 support has been merged upstream but is not yet included in a published +release. Track the +[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference) +for updates on ARM64 image availability. + +In the meantime, you can run the amd64 image under emulation on ARM64 nodes. If +you are using Docker Desktop, you must first disable the containerd image store +(**Settings > General > uncheck "Use containerd for pulling and storing +images" > Apply & Restart**). Without this, `kind load docker-image` silently +fails because the containerd store preserves multi-arch manifest indexes that +kind cannot import. See +[kind#3795](https://github.com/kubernetes-sigs/kind/issues/3795) for details. -First, pull the amd64 image and load it into your cluster: +Then pull the amd64 image and load it into your cluster: ```bash docker pull --platform linux/amd64 \ @@ -178,10 +188,6 @@ spec: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 ``` -Native ARM64 support is in progress upstream. Track the -[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference) -for updates. - ::: ## Tune the optimizer diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx index f8400c83..2272a83a 100644 --- a/docs/toolhive/tutorials/mcp-optimizer.mdx +++ b/docs/toolhive/tutorials/mcp-optimizer.mdx @@ -31,9 +31,8 @@ Server (vMCP) and an EmbeddingServer for semantic tool search. - How to create an MCPGroup with multiple backend MCP servers - How to deploy an EmbeddingServer for semantic search - How to create a VirtualMCPServer with the optimizer enabled -- How to connect your AI client to the optimized endpoint -- How to verify the optimizer reduces the visible toolset to `find_tool` and - `call_tool` +- How to connect your AI client to the optimized endpoint and verify it exposes + only `find_tool` and `call_tool` ## About MCP Optimizer @@ -94,10 +93,12 @@ Before starting this tutorial, make sure you have: :::warning[ARM64 compatibility] The default text embeddings inference (TEI) images depend on Intel MKL, which is -x86_64-only. If you are using Apple Silicon or any other ARM64 node (including -kind on macOS), you need to pre-pull the amd64 image before proceeding. See -[ARM64 compatibility](../guides-vmcp/optimizer.mdx#arm64-compatibility) for the -workaround steps. +x86_64-only. Native ARM64 support has been merged upstream but is not yet +included in a published release. If you are using Apple Silicon or any other +ARM64 nodes (including kind on macOS), you can run the amd64 image under +emulation as a workaround. See the +[EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource) +section for the required steps, including a Docker Desktop configuration change. ::: @@ -125,7 +126,7 @@ kubectl apply -f mcpgroup.yaml Next, deploy two MCP servers in the group. Both reference `optimizer-demo` in the `groupRef` field: -```yaml {11,30} title="mcpservers.yaml" +```yaml {11-12,31-32} title="mcpservers.yaml" apiVersion: toolhive.stacklok.dev/v1alpha1 kind: MCPServer metadata: @@ -136,7 +137,8 @@ spec: transport: streamable-http proxyPort: 8080 mcpPort: 8080 - groupRef: optimizer-demo + groupRef: + name: optimizer-demo resources: limits: cpu: '100m' @@ -155,7 +157,8 @@ spec: transport: streamable-http proxyPort: 8080 mcpPort: 8080 - groupRef: optimizer-demo + groupRef: + name: optimizer-demo resources: limits: cpu: '100m' @@ -234,7 +237,7 @@ Create a VirtualMCPServer that aggregates the backend servers and enables the optimizer. Adding `embeddingServerRef` is the only change needed to enable the optimizer - sensible defaults are applied automatically: -```yaml {8-9} title="virtualmcpserver.yaml" +```yaml title="virtualmcpserver.yaml" apiVersion: toolhive.stacklok.dev/v1alpha1 kind: VirtualMCPServer metadata: @@ -249,7 +252,8 @@ spec: type: anonymous serviceType: ClusterIP config: - groupRef: optimizer-demo + groupRef: + name: optimizer-demo aggregation: conflictResolution: prefix conflictResolutionConfig: @@ -348,9 +352,10 @@ To check your token savings, send this prompt to your AI client: ## Clean up -Delete the resources when you're done: +Remove the local workload and delete the Kubernetes resources when you're done: ```bash +thv rm optimizer-vmcp kubectl delete virtualmcpserver optimizer-vmcp -n toolhive-system kubectl delete embeddingserver optimizer-embedding -n toolhive-system kubectl delete mcpserver fetch osv -n toolhive-system @@ -363,19 +368,8 @@ To tear down the entire kind cluster from the K8s Quickstart: kind delete cluster --name toolhive ``` -:::note[Legacy: standalone MCP Optimizer] - -The standalone `mcp-optimizer` container can also run alongside ToolHive on -desktop (macOS, Windows, Linux) without Kubernetes. This approach is being -replaced by the Kubernetes-based optimizer described above. For the standalone -setup, see the [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx). - -::: - ## Next steps -Now that you've set up the MCP Optimizer, consider exploring these next steps: - - [Tune the optimizer](../guides-vmcp/optimizer.mdx#tune-the-optimizer) to adjust search parameters for your workload - [Configure authentication](../guides-vmcp/authentication.mdx) for production @@ -396,6 +390,6 @@ Now that you've set up the MCP Optimizer, consider exploring these next steps: - [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of vMCP - [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx) - standalone desktop - approach (legacy) + approach without Kubernetes (legacy, being replaced by the vMCP path) - [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx) - prerequisite tutorial From fdc70b970e272a87037d593916c48ea386cddf85 Mon Sep 17 00:00:00 2001 From: Alejandro Ponce Date: Thu, 16 Apr 2026 15:37:10 +0300 Subject: [PATCH 3/4] Use ARM64 TEI images, move groupRef to spec level Replace the ARM64 emulation workaround with the now-published cpu-arm64-latest image. Move groupRef from spec.config to spec level in all VirtualMCPServer examples to match the current CRD. Address remaining PR review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/toolhive/guides-vmcp/optimizer.mdx | 37 ++++------------------- docs/toolhive/tutorials/mcp-optimizer.mdx | 29 ++++++++++-------- 2 files changed, 23 insertions(+), 43 deletions(-) diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx index 42110407..a6b005cf 100644 --- a/docs/toolhive/guides-vmcp/optimizer.mdx +++ b/docs/toolhive/guides-vmcp/optimizer.mdx @@ -146,37 +146,10 @@ are: For the complete field reference, see the [EmbeddingServer CRD specification](../reference/crd-spec.md#apiv1alpha1embeddingserver). -:::warning[ARM64 compatibility] +:::tip[ARM64 support] -The default TEI CPU images depend on Intel MKL, which is x86_64-only. Native -ARM64 support has been merged upstream but is not yet included in a published -release. Track the -[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference) -for updates on ARM64 image availability. - -In the meantime, you can run the amd64 image under emulation on ARM64 nodes. If -you are using Docker Desktop, you must first disable the containerd image store -(**Settings > General > uncheck "Use containerd for pulling and storing -images" > Apply & Restart**). Without this, `kind load docker-image` silently -fails because the containerd store preserves multi-arch manifest indexes that -kind cannot import. See -[kind#3795](https://github.com/kubernetes-sigs/kind/issues/3795) for details. - -Then pull the amd64 image and load it into your cluster: - -```bash -docker pull --platform linux/amd64 \ - ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 -kind load docker-image \ - ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 -``` - -The `kind load` command is specific to kind. For other cluster distributions, -use the equivalent image-loading mechanism (for example, `ctr images import` for -containerd, or push the image to a registry your cluster can pull from). - -Then, pin the image in your EmbeddingServer so the operator uses the pre-pulled -tag instead of the default `cpu-latest`: +The default TEI image (`cpu-latest`) is x86_64-only. If you are running on ARM64 +nodes (for example, Apple Silicon), override the image in your EmbeddingServer: ```yaml title="embedding-server.yaml" apiVersion: toolhive.stacklok.dev/v1alpha1 @@ -185,7 +158,7 @@ metadata: name: my-embedding namespace: toolhive-system spec: - image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 + image: ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-latest ``` ::: @@ -294,6 +267,8 @@ metadata: name: full-vmcp namespace: toolhive-system spec: + groupRef: + name: my-tools embeddingServerRef: name: full-embedding groupRef: diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx index 2272a83a..6332ffec 100644 --- a/docs/toolhive/tutorials/mcp-optimizer.mdx +++ b/docs/toolhive/tutorials/mcp-optimizer.mdx @@ -31,8 +31,7 @@ Server (vMCP) and an EmbeddingServer for semantic tool search. - How to create an MCPGroup with multiple backend MCP servers - How to deploy an EmbeddingServer for semantic search - How to create a VirtualMCPServer with the optimizer enabled -- How to connect your AI client to the optimized endpoint and verify it exposes - only `find_tool` and `call_tool` +- How to connect your AI client to the optimized endpoint ## About MCP Optimizer @@ -90,15 +89,13 @@ Before starting this tutorial, make sure you have: - An MCP client (Visual Studio Code with GitHub Copilot is used in this tutorial) -:::warning[ARM64 compatibility] +:::tip[ARM64 support] -The default text embeddings inference (TEI) images depend on Intel MKL, which is -x86_64-only. Native ARM64 support has been merged upstream but is not yet -included in a published release. If you are using Apple Silicon or any other -ARM64 nodes (including kind on macOS), you can run the amd64 image under -emulation as a workaround. See the +The default TEI image is x86_64-only. If you are running on ARM64 nodes (for +example, Apple Silicon with kind), set the `image` field in your EmbeddingServer +to use the ARM64 image. See [EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource) -section for the required steps, including a Docker Desktop configuration change. +for details. ::: @@ -245,6 +242,8 @@ metadata: namespace: toolhive-system spec: # highlight-start + groupRef: + name: optimizer-demo embeddingServerRef: name: optimizer-embedding # highlight-end @@ -252,8 +251,6 @@ spec: type: anonymous serviceType: ClusterIP config: - groupRef: - name: optimizer-demo aggregation: conflictResolution: prefix conflictResolutionConfig: @@ -350,6 +347,14 @@ To check your token savings, send this prompt to your AI client: - "How many tokens did I save using MCP Optimizer?" +:::note + +With only two backend MCP servers and a small number of tools, the optimizer may +report minimal or no token savings. The benefit becomes more significant as you +add more backends and tools to your MCPGroup. + +::: + ## Clean up Remove the local workload and delete the Kubernetes resources when you're done: @@ -384,7 +389,7 @@ kind delete cluster --name toolhive ## Related information - [Optimize tool discovery](../guides-vmcp/optimizer.mdx) - full parameter - reference, high availability, and ARM64 workaround details + reference, high availability, and ARM64 support details - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on tool filtering and context pollution - [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of From e6d130adf4e72223d5305577ee3210f47b23c03d Mon Sep 17 00:00:00 2001 From: Alejandro Ponce Date: Thu, 16 Apr 2026 16:18:02 +0300 Subject: [PATCH 4/4] Inline ARM64 guidance in EmbeddingServer step Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/toolhive/tutorials/mcp-optimizer.mdx | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx index 6332ffec..337db246 100644 --- a/docs/toolhive/tutorials/mcp-optimizer.mdx +++ b/docs/toolhive/tutorials/mcp-optimizer.mdx @@ -89,16 +89,6 @@ Before starting this tutorial, make sure you have: - An MCP client (Visual Studio Code with GitHub Copilot is used in this tutorial) -:::tip[ARM64 support] - -The default TEI image is x86_64-only. If you are running on ARM64 nodes (for -example, Apple Silicon with kind), set the `image` field in your EmbeddingServer -to use the ARM64 image. See -[EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource) -for details. - -::: - ## Step 1: Create an MCPGroup and deploy backend MCP servers Create an MCPGroup to organize the backend MCP servers that the optimizer will @@ -195,7 +185,8 @@ The optimizer uses semantic search to find relevant tools. This requires an EmbeddingServer, which runs a text embeddings inference (TEI) server. Create an EmbeddingServer with default settings. This deploys the -`BAAI/bge-small-en-v1.5` model: +`BAAI/bge-small-en-v1.5` model. If you are running on ARM64 nodes (for example, +Apple Silicon with kind), uncomment the `image` line to use the ARM64 build: ```yaml title="embedding-server.yaml" apiVersion: toolhive.stacklok.dev/v1alpha1 @@ -203,7 +194,9 @@ kind: EmbeddingServer metadata: name: optimizer-embedding namespace: toolhive-system -spec: {} +spec: + # Uncomment for Apple Silicon or other ARM64 platforms + # image: ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-latest ``` Apply the resource: