From b58f9ef64f124a34bece7b1f27d3c5da2b0e17a1 Mon Sep 17 00:00:00 2001
From: Alejandro Ponce <aponcedeleonch@stacklok.com>
Date: Wed, 15 Apr 2026 17:41:07 +0300
Subject: [PATCH 1/4] Update MCP Optimizer tutorial to use K8s/vMCP path

Replace the deprecated Python-based standalone optimizer tutorial
with a Kubernetes-first walkthrough using VirtualMCPServer and
EmbeddingServer CRDs. Update cross-references in the vMCP optimizer
guide accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/toolhive/guides-vmcp/optimizer.mdx   |   7 +-
 docs/toolhive/tutorials/mcp-optimizer.mdx | 455 +++++++++++++---------
 2 files changed, 277 insertions(+), 185 deletions(-)

diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx
index aa24b63e..1b1a9000 100644
--- a/docs/toolhive/guides-vmcp/optimizer.mdx
+++ b/docs/toolhive/guides-vmcp/optimizer.mdx
@@ -10,9 +10,9 @@ number of tools exposed to clients can grow quickly. The optimizer addresses
 this by filtering tools per request, reducing token usage and improving tool
 selection accuracy.
 
-For the desktop/CLI approach using the MCP Optimizer container, see the
+For a step-by-step tutorial that walks through the full setup, see the
 [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx). This guide covers the
-Kubernetes operator approach using VirtualMCPServer and EmbeddingServer CRDs.
+configuration details for the VirtualMCPServer and EmbeddingServer CRDs.
 
 ## Benefits
 
@@ -314,7 +314,8 @@ spec:
 
 ## Related information
 
-- [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx) - desktop/CLI setup
+- [MCP Optimizer tutorial](../tutorials/mcp-optimizer.mdx) - end-to-end
+  Kubernetes setup
 - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on
   tool filtering and context pollution
 - [Configure vMCP servers](./configuration.mdx)
diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx
index d0ebaaad..f8400c83 100644
--- a/docs/toolhive/tutorials/mcp-optimizer.mdx
+++ b/docs/toolhive/tutorials/mcp-optimizer.mdx
@@ -1,54 +1,57 @@
 ---
 title: Reduce token usage with MCP Optimizer
 description:
-  Enable the MCP Optimizer to enhance tool selection and reduce token usage.
+  Deploy the MCP Optimizer on Kubernetes with Virtual MCP Server and an
+  EmbeddingServer to filter tools and reduce token usage.
 schema_type: tutorial
 ---
 
 ## Overview
 
-The ToolHive MCP Optimizer acts as an intelligent intermediary between AI
-clients and MCP servers. It provides tool discovery, unified access to multiple
-MCP servers through a single endpoint, and intelligent routing of requests to
-appropriate MCP tools.
+The MCP Optimizer acts as an intelligent intermediary between AI clients and MCP
+servers. It provides tool discovery, unified access to multiple MCP servers
+through a single endpoint, and intelligent routing of requests to appropriate
+MCP tools.
 
 :::note[Moving to vMCP]
 
 The optimizer is now integrated into
 [Virtual MCP Server (vMCP)](../guides-vmcp/optimizer.mdx), which provides the
 same tool filtering and token reduction at the team level. You can deploy it in
-Kubernetes today, and a local experience is coming soon. This tutorial covers
-the standalone CLI approach in the meantime.
+Kubernetes today, and a local experience is coming soon. This tutorial walks you
+through the Kubernetes deployment.
 
 :::
 
-## About MCP Optimizer
+In this tutorial, you deploy the optimizer on Kubernetes using Virtual MCP
+Server (vMCP) and an EmbeddingServer for semantic tool search.
 
-### Benefits
+## What you'll learn
 
-- **Reduced token usage**: Narrow down the toolset to only relevant tools for a
-  given task, minimizing context overload and token consumption
-- **Improved tool selection**: Find the most appropriate tools across all
-  connected MCP servers
-- **Simplified client configuration**: Connect to a single MCP Optimizer
-  endpoint instead of managing multiple MCP server connections
+- How to create an MCPGroup with multiple backend MCP servers
+- How to deploy an EmbeddingServer for semantic search
+- How to create a VirtualMCPServer with the optimizer enabled
+- How to connect your AI client to the optimized endpoint
+- How to verify the optimizer reduces the visible toolset to `find_tool` and
+  `call_tool`
 
-### How it works
+## About MCP Optimizer
 
-Instead of flooding the model with all available tools, MCP Optimizer introduces
-two lightweight primitives:
+Instead of exposing every backend tool to the model, the optimizer introduces
+two lightweight primitives: `find_tool` for semantic search and `call_tool` for
+routing. This keeps context small and improves tool selection accuracy. For the
+full parameter reference and configuration options, see
+[Optimize tool discovery](../guides-vmcp/optimizer.mdx).
 
-1. `find_tool`: Searches for the most relevant tools using hybrid semantic +
-   keyword search
-2. `call_tool`: Routes the selected tool request to the appropriate MCP server
+### How it works
 
 The workflow is as follows:
 
-1. You send a prompt that requires tool assistance (for example, interacting
-   with a GitHub repo)
+1. You send a prompt that requires tool assistance (for example, fetching a web
+   page)
 2. The assistant calls `find_tool` with relevant keywords extracted from the
    prompt
-3. MCP Optimizer returns the most relevant tools (up to 8 by default, but this
+3. The optimizer returns the most relevant tools (up to 8 by default, but this
    is configurable)
 4. Only those tools and their descriptions are included in the context sent to
    the model
@@ -56,255 +59,343 @@ The workflow is as follows:
 
 ```mermaid
 flowchart TB
-  subgraph optimizerGroup["MCP Optimizer group (internal)"]
+  subgraph vmcpGroup["VirtualMCPServer"]
     direction TB
-    optimizer["MCP Optimizer"]
+    vmcp["vMCP (optimizer enabled)"]
   end
-  subgraph target["ToolHive group: default"]
+  subgraph embedding["EmbeddingServer"]
+    direction TB
+    tei["Text Embeddings Inference"]
+  end
+  subgraph backends["MCPGroup backends"]
     direction TB
     mcp1["MCP server"]
     mcp2["MCP server"]
     mcp3["MCP server"]
   end
 
-  client(["Client"]) <-- connects --> optimizerGroup
-  optimizer <-. discovers/routes .-> target
+  client(["Client"]) <-- "find_tool / call_tool" --> vmcpGroup
+  vmcp <-. "semantic search" .-> embedding
+  vmcp <-. "discovers / routes" .-> backends
 ```
 
 ## Prerequisites
 
-- One of the following container runtimes:
-  - macOS: Docker Desktop, Podman Desktop, or Rancher Desktop (using dockerd)
-  - Windows: Docker Desktop or Rancher Desktop (using dockerd)
-  - Linux: any container runtime (see [Linux setup](#linux-setup))
-- ToolHive CLI
+Before starting this tutorial, make sure you have:
+
+- A Kubernetes cluster with the ToolHive operator installed (see
+  [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx))
+- `kubectl` configured to communicate with your cluster
+- The [ToolHive CLI](../guides-cli/quickstart.mdx) installed on your local
+  machine (used in Step 4 to register the endpoint with your MCP clients)
+- An MCP client (Visual Studio Code with GitHub Copilot is used in this
+  tutorial)
 
-## Step 1: Install MCP servers in a ToolHive group
+:::warning[ARM64 compatibility]
 
-Before you can use MCP Optimizer, you need to have one or more MCP servers
-running in a ToolHive group. If you don't have any MCP servers set up yet,
-follow these steps:
+The default text embeddings inference (TEI) images depend on Intel MKL, which is
+x86_64-only. If you are using Apple Silicon or any other ARM64 node (including
+kind on macOS), you need to pre-pull the amd64 image before proceeding. See
+[ARM64 compatibility](../guides-vmcp/optimizer.mdx#arm64-compatibility) for the
+workaround steps.
 
-Run one or more MCP servers in the `default` group. For this tutorial, you can
-run the following example MCP servers:
+:::
 
-- `github`: Provides tools for interacting with GitHub repositories
-  ([guide](../guides-mcp/github.mdx?mode=cli))
-- `fetch`: Provides a web search tool to fetch recent news articles
-- `time`: Provides a tool to get the current time in various time zones
+## Step 1: Create an MCPGroup and deploy backend MCP servers
 
-```bash
-thv run github
-thv run fetch
-thv run time
-```
+Create an MCPGroup to organize the backend MCP servers that the optimizer will
+index and route to:
 
-See the [Run MCP servers](../guides-cli/run-mcp-servers.mdx) guide for more
-details.
+```yaml title="mcpgroup.yaml"
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: MCPGroup
+metadata:
+  name: optimizer-demo
+  namespace: toolhive-system
+spec:
+  description: Backend servers for the optimizer tutorial
+```
 
-Verify the MCP servers are running:
+Apply the resource:
 
 ```bash
-thv list
+kubectl apply -f mcpgroup.yaml
 ```
 
-## Step 2: Connect your AI client
+Next, deploy two MCP servers in the group. Both reference `optimizer-demo` in
+the `groupRef` field:
+
+```yaml {11,30} title="mcpservers.yaml"
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: MCPServer
+metadata:
+  name: fetch
+  namespace: toolhive-system
+spec:
+  image: ghcr.io/stackloklabs/gofetch/server
+  transport: streamable-http
+  proxyPort: 8080
+  mcpPort: 8080
+  groupRef: optimizer-demo
+  resources:
+    limits:
+      cpu: '100m'
+      memory: '128Mi'
+    requests:
+      cpu: '50m'
+      memory: '64Mi'
+---
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: MCPServer
+metadata:
+  name: osv
+  namespace: toolhive-system
+spec:
+  image: ghcr.io/stackloklabs/osv-mcp/server
+  transport: streamable-http
+  proxyPort: 8080
+  mcpPort: 8080
+  groupRef: optimizer-demo
+  resources:
+    limits:
+      cpu: '100m'
+      memory: '128Mi'
+    requests:
+      cpu: '50m'
+      memory: '64Mi'
+```
 
-Connect your AI client to the ToolHive group where the MCP servers are running
-(for example, the `default` group).
+Apply the resources and wait for both servers to be ready:
 
-:::note
+```bash
+kubectl apply -f mcpservers.yaml
+kubectl get mcpservers -n toolhive-system -w
+```
 
-For best results, connect your client to only the optimized group. If you
-connect it to multiple groups, ensure there is no overlap in MCP servers between
-the groups to avoid unpredictable behavior.
+You should see both servers with `Ready` status before continuing.
 
-:::
+:::note
 
-Run the following command to register your AI client with the ToolHive group
-where the MCP servers are running (for example, `default`):
+If you still have an MCPServer left over from the
+[K8s Operator Quickstart](../guides-k8s/quickstart.mdx), you can delete it first
+to avoid confusion:
 
 ```bash
-thv client setup
+kubectl delete mcpserver fetch -n toolhive-system
 ```
 
-See the [Client configuration](../guides-cli/client-configuration.mdx) guide for
-more details.
+Then apply the YAML above, which creates a new `fetch` server with the correct
+`groupRef`.
 
-Open your AI client and verify that it is connected to the correct MCP servers.
-If you installed the `github`, `fetch`, and `time` servers, you should see
-almost 50 tools available.
-
-## Step 3: Enable MCP Optimizer
+:::
 
-If you are on Linux with native containers, follow the steps below but see
-[Linux setup](#linux-setup) for the modified `thv run` command.
+## Step 2: Deploy an EmbeddingServer
 
-**Step 3.1: Run the API server**
+The optimizer uses semantic search to find relevant tools. This requires an
+EmbeddingServer, which runs a text embeddings inference (TEI) server.
 
-MCP Optimizer uses the ToolHive API server to discover MCP servers and manage
-client connections.
+Create an EmbeddingServer with default settings. This deploys the
+`BAAI/bge-small-en-v1.5` model:
 
-You can run the API server in two ways. The simplest is to install and run the
-ToolHive UI, which automatically starts the API server in the background.
+```yaml title="embedding-server.yaml"
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: EmbeddingServer
+metadata:
+  name: optimizer-embedding
+  namespace: toolhive-system
+spec: {}
+```
 
-If you prefer to run the API server manually using the CLI, open a dedicated
-terminal window and start it on a specific port:
+Apply the resource:
 
 ```bash
-thv serve --port 50100
+kubectl apply -f embedding-server.yaml
 ```
 
-Note the port number (`50100` in this example) for use in the next step.
-
-**Step 3.2: Create a dedicated group and run mcp-optimizer**
+Wait for the EmbeddingServer to reach the `Ready` phase before proceeding. The
+first startup may take a few minutes while the model downloads:
 
 ```bash
-# Create the meta group
-thv group create optimizer
-
-# Run mcp-optimizer in the dedicated group
-thv run --group optimizer -e TOOLHIVE_PORT=50100 mcp-optimizer
+kubectl get embeddingserver optimizer-embedding -n toolhive-system -w
 ```
 
-If you are running the API server using the ToolHive UI, omit the
-`TOOLHIVE_PORT` environment variable.
+:::info[What's happening?]
+
+The EmbeddingServer deploys a TEI container that generates vector embeddings
+from text. The optimizer uses these embeddings to perform semantic search across
+all backend tools, finding the most relevant tools for a given query even when
+the exact keywords don't match.
 
-**Step 3.3: Configure your AI client for the meta group**
+:::
+
+## Step 3: Create a VirtualMCPServer with the optimizer
+
+Create a VirtualMCPServer that aggregates the backend servers and enables the
+optimizer. Adding `embeddingServerRef` is the only change needed to enable the
+optimizer - sensible defaults are applied automatically:
+
+```yaml {8-9} title="virtualmcpserver.yaml"
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: VirtualMCPServer
+metadata:
+  name: optimizer-vmcp
+  namespace: toolhive-system
+spec:
+  # highlight-start
+  embeddingServerRef:
+    name: optimizer-embedding
+  # highlight-end
+  incomingAuth:
+    type: anonymous
+  serviceType: ClusterIP
+  config:
+    groupRef: optimizer-demo
+    aggregation:
+      conflictResolution: prefix
+      conflictResolutionConfig:
+        prefixFormat: '{workload}_'
+```
 
-Remove your client from the `default` group. For example, to unregister Cursor:
+Apply the resource:
 
 ```bash
-thv client remove cursor --group default
+kubectl apply -f virtualmcpserver.yaml
 ```
 
-Then, register your client with the `optimizer` group:
+Check the status:
 
 ```bash
-# Run the group setup, select the optimizer group, and then select your client
-thv client setup
+kubectl get virtualmcpservers -n toolhive-system
+```
+
+After about 30 seconds, you should see output similar to:
 
-# Verify the configuration
-thv client list-registered
+```text
+NAME             PHASE   URL                                                                 BACKENDS   AGE   READY
+optimizer-vmcp   Ready   http://vmcp-optimizer-vmcp.toolhive-system.svc.cluster.local:4483   2          30s   True
 ```
 
-:::note
+:::info[What's happening?]
 
-Your client now connects only to the `optimizer` group and sees only the
-`mcp-optimizer` MCP server.
+Setting `embeddingServerRef` tells the operator to enable the optimizer on this
+VirtualMCPServer. Instead of exposing all backend tools directly, the optimizer
+builds a semantic index of tools and exposes only `find_tool` and `call_tool` to
+clients. This dramatically reduces the number of tools (and tokens) sent to the
+model.
 
 :::
 
-The resulting configuration should look like this:
+## Step 4: Connect your AI client
 
-```mermaid
-flowchart TB
-  subgraph meta["ToolHive group: optimizer"]
-    direction TB
-    optimizer["mcp-optimizer"]
-  end
-  subgraph def["ToolHive group: default"]
-    direction TB
-    mcp1["github"]
-    mcp2["fetch"]
-    mcp3["time"]
-  end
+The vMCP service runs inside Kubernetes and is not directly reachable by desktop
+AI clients. This tutorial uses `kubectl port-forward` because it works with any
+cluster, but in production you would typically expose the service through an
+Ingress, Gateway API, or LoadBalancer. See
+[Expose the service](../guides-vmcp/configuration.mdx#expose-the-service) for
+the available options.
 
-  client(["Client"]) <-- connects --> meta
-  optimizer <-. discovers/routes .-> def
-  client x-. 🚫 .-x def
+In a separate terminal, port-forward the vMCP service to your local machine:
+
+```bash
+kubectl port-forward service/vmcp-optimizer-vmcp -n toolhive-system 4483:4483
 ```
 
-## Step 4: Sample prompts
+Test the health endpoint:
 
-After you configure and run MCP Optimizer, you can use the same prompts you
-would normally use with individual MCP servers. The Optimizer automatically
-discovers and routes to appropriate tools.
+```bash
+curl http://localhost:4483/health
+```
 
-Using the example MCP servers above, here are some sample prompts:
+You should see `{"status":"ok"}`.
 
-- "Get the details of GitHub issue 1911 from the stacklok/toolhive repo"
-- "List recent PRs from the stacklok/toolhive repo"
-- "Fetch the latest news articles about AI"
-- "What is the current time in Tokyo?"
+The ToolHive CLI bridges the remaining gap: it registers the port-forwarded
+endpoint as a local workload and automatically updates your MCP client
+configuration to point at it.
 
-Watch how MCP Optimizer intelligently selects and routes to the relevant tools
-across the connected MCP servers, reducing token usage and improving response
-quality.
+Register the port-forwarded vMCP endpoint as a ToolHive-managed workload:
 
-To check your token savings, you can ask the optimizer:
+```bash
+thv run http://localhost:4483/mcp --name optimizer-vmcp
+```
 
-- "How many tokens did I save using MCP Optimizer?"
+:::tip
 
-## Linux setup
+If you haven't set up client configuration yet, run `thv client setup` to
+register your MCP clients. See
+[Client configuration](../guides-cli/client-configuration.mdx) for more details.
 
-The setup depends on which type of container runtime you are using.
+:::
 
-### VM-based container runtimes
+Open your AI client and check its MCP configuration. You should see only two
+tools available: `find_tool` and `call_tool`. This confirms the optimizer is
+working.
 
-If you are using a container runtime that runs containers inside a virtual
-machine (such as Docker Desktop for Linux), the setup is the same as on macOS
-and Windows. No additional configuration is needed - follow the steps above.
+## Step 5: Test the optimizer
 
-### Native containers (Docker, Podman, Rancher Desktop, and others)
+Try these sample prompts to verify the optimizer is routing requests correctly
+across both backend MCP servers:
 
-:::note
+- "Fetch the contents of https://docs.stacklok.com and summarize the page"
+- "Check if the Go package github.com/stacklok/toolhive has any known
+  vulnerabilities"
 
-Before running the command below, complete the following:
+Watch how the optimizer uses `find_tool` to locate the relevant tool across all
+backends, then `call_tool` to execute it - all through a single endpoint.
 
-1. [Step 1](#step-1-install-mcp-servers-in-a-toolhive-group) - install your MCP
-   servers
-2. [Step 2](#step-2-connect-your-ai-client) - connect your AI client
-3. [Step 3](#step-3-enable-mcp-optimizer) - start the API server, create the
-   optimizer group, and reconfigure your client. When you reach the
-   `thv run mcp-optimizer` command, use the Linux-specific command below
-   instead.
+To check your token savings, send this prompt to your AI client:
 
-:::
+- "How many tokens did I save using MCP Optimizer?"
+
+## Clean up
 
-Most Linux container runtimes run containers natively on the host kernel.
-Because containers run directly on the host kernel, `host.docker.internal` is
-not automatically configured - unlike on macOS and Windows, where Docker Desktop
-sets it up to let containers reach the host from inside a virtual machine.
-Instead, you need to pass a couple of extra flags:
+Delete the resources when you're done:
 
 ```bash
-# Run mcp-optimizer with host networking
-thv run --group optimizer --network host \
-  -e TOOLHIVE_HOST=127.0.0.1 \
-  -e ALLOWED_GROUPS=default \
-  mcp-optimizer
+kubectl delete virtualmcpserver optimizer-vmcp -n toolhive-system
+kubectl delete embeddingserver optimizer-embedding -n toolhive-system
+kubectl delete mcpserver fetch osv -n toolhive-system
+kubectl delete mcpgroup optimizer-demo -n toolhive-system
 ```
 
-- `--network host` lets the container reach the host directly, achieving the
-  same result as the automatic bridge Docker Desktop sets up on macOS and
-  Windows.
-- `TOOLHIVE_PORT` specifies the port the API server is listening on. If you
-  started it manually with a custom port in Step 3.1, pass
-  `-e TOOLHIVE_PORT=<PORT>` here as well. Omit it if you are using the ToolHive
-  UI to run the API server.
-- `TOOLHIVE_HOST` tells `mcp-optimizer` to connect to `127.0.0.1` instead of
-  `host.docker.internal`.
-- `ALLOWED_GROUPS` tells the optimizer which group's MCP servers to discover,
-  index, and route requests to. Replace `default` with the name of the group you
-  want to optimize.
+To tear down the entire kind cluster from the K8s Quickstart:
 
-To change which groups MCP Optimizer can optimize after initial setup, remove
-the workload and run the command again with the updated `ALLOWED_GROUPS` value
-(see [Remove a server](../guides-cli/manage-mcp-servers.mdx#remove-a-server)).
+```bash
+kind delete cluster --name toolhive
+```
+
+:::note[Legacy: standalone MCP Optimizer]
+
+The standalone `mcp-optimizer` container can also run alongside ToolHive on
+desktop (macOS, Windows, Linux) without Kubernetes. This approach is being
+replaced by the Kubernetes-based optimizer described above. For the standalone
+setup, see the [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx).
+
+:::
 
-See [Step 4: Sample prompts](#step-4-sample-prompts) to verify the setup.
+## Next steps
 
-## What's next?
+Now that you've set up the MCP Optimizer, consider exploring these next steps:
 
-- Experiment with different MCP servers to see how MCP Optimizer enhances tool
-  selection and reduces token usage
-- Explore the [vMCP optimizer](../guides-vmcp/optimizer.mdx) for team-level
-  optimization in Kubernetes
+- [Tune the optimizer](../guides-vmcp/optimizer.mdx#tune-the-optimizer) to
+  adjust search parameters for your workload
+- [Configure authentication](../guides-vmcp/authentication.mdx) for production
+  deployments
+- [Monitor vMCP activity](../guides-vmcp/telemetry-and-metrics.mdx) with
+  OpenTelemetry tracing and metrics
+- [Configure failure handling](../guides-vmcp/failure-handling.mdx) for circuit
+  breakers and partial failure modes
+- Provide feedback on your experience on the
+  [Stacklok Discord community](https://discord.gg/stacklok)
 
 ## Related information
 
-- [Optimize tool discovery in vMCP](../guides-vmcp/optimizer.mdx) - Kubernetes
-  operator approach
+- [Optimize tool discovery](../guides-vmcp/optimizer.mdx) - full parameter
+  reference, high availability, and ARM64 workaround details
 - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on
   tool filtering and context pollution
+- [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of
+  vMCP
+- [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx) - standalone desktop
+  approach (legacy)
+- [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx) - prerequisite
+  tutorial

From a673ec16464fc09f0f5873c941d1dd23fd0b0ecb Mon Sep 17 00:00:00 2001
From: Alejandro Ponce <aponcedeleonch@stacklok.com>
Date: Thu, 16 Apr 2026 11:35:42 +0300
Subject: [PATCH 2/4] Address PR review feedback for optimizer tutorial

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/toolhive/guides-vmcp/optimizer.mdx   | 22 +++++++-----
 docs/toolhive/tutorials/mcp-optimizer.mdx | 44 ++++++++++-------------
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx
index 1b1a9000..42110407 100644
--- a/docs/toolhive/guides-vmcp/optimizer.mdx
+++ b/docs/toolhive/guides-vmcp/optimizer.mdx
@@ -148,11 +148,21 @@ For the complete field reference, see the
 
 :::warning[ARM64 compatibility]
 
-The default TEI CPU images depend on Intel MKL, which is x86_64-only. No
-official ARM64 images exist yet. On ARM64 nodes (including Apple Silicon with
-kind), you can run the amd64 image under emulation as a workaround.
+The default TEI CPU images depend on Intel MKL, which is x86_64-only. Native
+ARM64 support has been merged upstream but is not yet included in a published
+release. Track the
+[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference)
+for updates on ARM64 image availability.
+
+In the meantime, you can run the amd64 image under emulation on ARM64 nodes. If
+you are using Docker Desktop, you must first disable the containerd image store
+(**Settings > General > uncheck "Use containerd for pulling and storing
+images" > Apply & Restart**). Without this, `kind load docker-image` silently
+fails because the containerd store preserves multi-arch manifest indexes that
+kind cannot import. See
+[kind#3795](https://github.com/kubernetes-sigs/kind/issues/3795) for details.
 
-First, pull the amd64 image and load it into your cluster:
+Then pull the amd64 image and load it into your cluster:
 
 ```bash
 docker pull --platform linux/amd64 \
@@ -178,10 +188,6 @@ spec:
   image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
 ```
 
-Native ARM64 support is in progress upstream. Track the
-[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference)
-for updates.
-
 :::
 
 ## Tune the optimizer
diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx
index f8400c83..2272a83a 100644
--- a/docs/toolhive/tutorials/mcp-optimizer.mdx
+++ b/docs/toolhive/tutorials/mcp-optimizer.mdx
@@ -31,9 +31,8 @@ Server (vMCP) and an EmbeddingServer for semantic tool search.
 - How to create an MCPGroup with multiple backend MCP servers
 - How to deploy an EmbeddingServer for semantic search
 - How to create a VirtualMCPServer with the optimizer enabled
-- How to connect your AI client to the optimized endpoint
-- How to verify the optimizer reduces the visible toolset to `find_tool` and
-  `call_tool`
+- How to connect your AI client to the optimized endpoint and verify it exposes
+  only `find_tool` and `call_tool`
 
 ## About MCP Optimizer
 
@@ -94,10 +93,12 @@ Before starting this tutorial, make sure you have:
 :::warning[ARM64 compatibility]
 
 The default text embeddings inference (TEI) images depend on Intel MKL, which is
-x86_64-only. If you are using Apple Silicon or any other ARM64 node (including
-kind on macOS), you need to pre-pull the amd64 image before proceeding. See
-[ARM64 compatibility](../guides-vmcp/optimizer.mdx#arm64-compatibility) for the
-workaround steps.
+x86_64-only. Native ARM64 support has been merged upstream but is not yet
+included in a published release. If you are using Apple Silicon or any other
+ARM64 nodes (including kind on macOS), you can run the amd64 image under
+emulation as a workaround. See the
+[EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource)
+section for the required steps, including a Docker Desktop configuration change.
 
 :::
 
@@ -125,7 +126,7 @@ kubectl apply -f mcpgroup.yaml
 Next, deploy two MCP servers in the group. Both reference `optimizer-demo` in
 the `groupRef` field:
 
-```yaml {11,30} title="mcpservers.yaml"
+```yaml {11-12,31-32} title="mcpservers.yaml"
 apiVersion: toolhive.stacklok.dev/v1alpha1
 kind: MCPServer
 metadata:
@@ -136,7 +137,8 @@ spec:
   transport: streamable-http
   proxyPort: 8080
   mcpPort: 8080
-  groupRef: optimizer-demo
+  groupRef:
+    name: optimizer-demo
   resources:
     limits:
       cpu: '100m'
@@ -155,7 +157,8 @@ spec:
   transport: streamable-http
   proxyPort: 8080
   mcpPort: 8080
-  groupRef: optimizer-demo
+  groupRef:
+    name: optimizer-demo
   resources:
     limits:
       cpu: '100m'
@@ -234,7 +237,7 @@ Create a VirtualMCPServer that aggregates the backend servers and enables the
 optimizer. Adding `embeddingServerRef` is the only change needed to enable the
 optimizer - sensible defaults are applied automatically:
 
-```yaml {8-9} title="virtualmcpserver.yaml"
+```yaml title="virtualmcpserver.yaml"
 apiVersion: toolhive.stacklok.dev/v1alpha1
 kind: VirtualMCPServer
 metadata:
@@ -249,7 +252,8 @@ spec:
     type: anonymous
   serviceType: ClusterIP
   config:
-    groupRef: optimizer-demo
+    groupRef:
+      name: optimizer-demo
     aggregation:
       conflictResolution: prefix
       conflictResolutionConfig:
@@ -348,9 +352,10 @@ To check your token savings, send this prompt to your AI client:
 
 ## Clean up
 
-Delete the resources when you're done:
+Remove the local workload and delete the Kubernetes resources when you're done:
 
 ```bash
+thv rm optimizer-vmcp
 kubectl delete virtualmcpserver optimizer-vmcp -n toolhive-system
 kubectl delete embeddingserver optimizer-embedding -n toolhive-system
 kubectl delete mcpserver fetch osv -n toolhive-system
@@ -363,19 +368,8 @@ To tear down the entire kind cluster from the K8s Quickstart:
 kind delete cluster --name toolhive
 ```
 
-:::note[Legacy: standalone MCP Optimizer]
-
-The standalone `mcp-optimizer` container can also run alongside ToolHive on
-desktop (macOS, Windows, Linux) without Kubernetes. This approach is being
-replaced by the Kubernetes-based optimizer described above. For the standalone
-setup, see the [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx).
-
-:::
-
 ## Next steps
 
-Now that you've set up the MCP Optimizer, consider exploring these next steps:
-
 - [Tune the optimizer](../guides-vmcp/optimizer.mdx#tune-the-optimizer) to
   adjust search parameters for your workload
 - [Configure authentication](../guides-vmcp/authentication.mdx) for production
@@ -396,6 +390,6 @@ Now that you've set up the MCP Optimizer, consider exploring these next steps:
 - [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of
   vMCP
 - [MCP Optimizer UI guide](../guides-ui/mcp-optimizer.mdx) - standalone desktop
-  approach (legacy)
+  approach without Kubernetes (legacy, being replaced by the vMCP path)
 - [Quickstart: Kubernetes Operator](../guides-k8s/quickstart.mdx) - prerequisite
   tutorial

From fdc70b970e272a87037d593916c48ea386cddf85 Mon Sep 17 00:00:00 2001
From: Alejandro Ponce <aponcedeleonch@stacklok.com>
Date: Thu, 16 Apr 2026 15:37:10 +0300
Subject: [PATCH 3/4] Use ARM64 TEI images, move groupRef to spec level

Replace the ARM64 emulation workaround with the now-published
cpu-arm64-latest image. Move groupRef from spec.config to spec
level in all VirtualMCPServer examples to match the current CRD.
Address remaining PR review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/toolhive/guides-vmcp/optimizer.mdx   | 37 ++++-------------------
 docs/toolhive/tutorials/mcp-optimizer.mdx | 29 ++++++++++--------
 2 files changed, 23 insertions(+), 43 deletions(-)

diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx
index 42110407..a6b005cf 100644
--- a/docs/toolhive/guides-vmcp/optimizer.mdx
+++ b/docs/toolhive/guides-vmcp/optimizer.mdx
@@ -146,37 +146,10 @@ are:
 For the complete field reference, see the
 [EmbeddingServer CRD specification](../reference/crd-spec.md#apiv1alpha1embeddingserver).
 
-:::warning[ARM64 compatibility]
+:::tip[ARM64 support]
 
-The default TEI CPU images depend on Intel MKL, which is x86_64-only. Native
-ARM64 support has been merged upstream but is not yet included in a published
-release. Track the
-[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference)
-for updates on ARM64 image availability.
-
-In the meantime, you can run the amd64 image under emulation on ARM64 nodes. If
-you are using Docker Desktop, you must first disable the containerd image store
-(**Settings > General > uncheck "Use containerd for pulling and storing
-images" > Apply & Restart**). Without this, `kind load docker-image` silently
-fails because the containerd store preserves multi-arch manifest indexes that
-kind cannot import. See
-[kind#3795](https://github.com/kubernetes-sigs/kind/issues/3795) for details.
-
-Then pull the amd64 image and load it into your cluster:
-
-```bash
-docker pull --platform linux/amd64 \
-  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
-kind load docker-image \
-  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
-```
-
-The `kind load` command is specific to kind. For other cluster distributions,
-use the equivalent image-loading mechanism (for example, `ctr images import` for
-containerd, or push the image to a registry your cluster can pull from).
-
-Then, pin the image in your EmbeddingServer so the operator uses the pre-pulled
-tag instead of the default `cpu-latest`:
+The default TEI image (`cpu-latest`) is x86_64-only. If you are running on ARM64
+nodes (for example, Apple Silicon), override the image in your EmbeddingServer:
 
 ```yaml title="embedding-server.yaml"
 apiVersion: toolhive.stacklok.dev/v1alpha1
@@ -185,7 +158,7 @@ metadata:
   name: my-embedding
   namespace: toolhive-system
 spec:
-  image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
+  image: ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-latest
 ```
 
 :::
@@ -294,6 +267,8 @@ metadata:
   name: full-vmcp
   namespace: toolhive-system
 spec:
+  groupRef:
+    name: my-tools
   embeddingServerRef:
     name: full-embedding
   groupRef:
diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx
index 2272a83a..6332ffec 100644
--- a/docs/toolhive/tutorials/mcp-optimizer.mdx
+++ b/docs/toolhive/tutorials/mcp-optimizer.mdx
@@ -31,8 +31,7 @@ Server (vMCP) and an EmbeddingServer for semantic tool search.
 - How to create an MCPGroup with multiple backend MCP servers
 - How to deploy an EmbeddingServer for semantic search
 - How to create a VirtualMCPServer with the optimizer enabled
-- How to connect your AI client to the optimized endpoint and verify it exposes
-  only `find_tool` and `call_tool`
+- How to connect your AI client to the optimized endpoint
 
 ## About MCP Optimizer
 
@@ -90,15 +89,13 @@ Before starting this tutorial, make sure you have:
 - An MCP client (Visual Studio Code with GitHub Copilot is used in this
   tutorial)
 
-:::warning[ARM64 compatibility]
+:::tip[ARM64 support]
 
-The default text embeddings inference (TEI) images depend on Intel MKL, which is
-x86_64-only. Native ARM64 support has been merged upstream but is not yet
-included in a published release. If you are using Apple Silicon or any other
-ARM64 nodes (including kind on macOS), you can run the amd64 image under
-emulation as a workaround. See the
+The default TEI image is x86_64-only. If you are running on ARM64 nodes (for
+example, Apple Silicon with kind), set the `image` field in your EmbeddingServer
+to use the ARM64 image. See
 [EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource)
-section for the required steps, including a Docker Desktop configuration change.
+for details.
 
 :::
 
@@ -245,6 +242,8 @@ metadata:
   namespace: toolhive-system
 spec:
   # highlight-start
+  groupRef:
+    name: optimizer-demo
   embeddingServerRef:
     name: optimizer-embedding
   # highlight-end
@@ -252,8 +251,6 @@ spec:
     type: anonymous
   serviceType: ClusterIP
   config:
-    groupRef:
-      name: optimizer-demo
     aggregation:
       conflictResolution: prefix
       conflictResolutionConfig:
@@ -350,6 +347,14 @@ To check your token savings, send this prompt to your AI client:
 
 - "How many tokens did I save using MCP Optimizer?"
 
+:::note
+
+With only two backend MCP servers and a small number of tools, the optimizer may
+report minimal or no token savings. The benefit becomes more significant as you
+add more backends and tools to your MCPGroup.
+
+:::
+
 ## Clean up
 
 Remove the local workload and delete the Kubernetes resources when you're done:
@@ -384,7 +389,7 @@ kind delete cluster --name toolhive
 ## Related information
 
 - [Optimize tool discovery](../guides-vmcp/optimizer.mdx) - full parameter
-  reference, high availability, and ARM64 workaround details
+  reference, high availability, and ARM64 support details
 - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on
   tool filtering and context pollution
 - [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of

From e6d130adf4e72223d5305577ee3210f47b23c03d Mon Sep 17 00:00:00 2001
From: Alejandro Ponce <aponcedeleonch@stacklok.com>
Date: Thu, 16 Apr 2026 16:18:02 +0300
Subject: [PATCH 4/4] Inline ARM64 guidance in EmbeddingServer step

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/toolhive/tutorials/mcp-optimizer.mdx | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx
index 6332ffec..337db246 100644
--- a/docs/toolhive/tutorials/mcp-optimizer.mdx
+++ b/docs/toolhive/tutorials/mcp-optimizer.mdx
@@ -89,16 +89,6 @@ Before starting this tutorial, make sure you have:
 - An MCP client (Visual Studio Code with GitHub Copilot is used in this
   tutorial)
 
-:::tip[ARM64 support]
-
-The default TEI image is x86_64-only. If you are running on ARM64 nodes (for
-example, Apple Silicon with kind), set the `image` field in your EmbeddingServer
-to use the ARM64 image. See
-[EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource)
-for details.
-
-:::
-
 ## Step 1: Create an MCPGroup and deploy backend MCP servers
 
 Create an MCPGroup to organize the backend MCP servers that the optimizer will
@@ -195,7 +185,8 @@ The optimizer uses semantic search to find relevant tools. This requires an
 EmbeddingServer, which runs a text embeddings inference (TEI) server.
 
 Create an EmbeddingServer with default settings. This deploys the
-`BAAI/bge-small-en-v1.5` model:
+`BAAI/bge-small-en-v1.5` model. If you are running on ARM64 nodes (for example,
+Apple Silicon with kind), uncomment the `image` line to use the ARM64 build:
 
 ```yaml title="embedding-server.yaml"
 apiVersion: toolhive.stacklok.dev/v1alpha1
@@ -203,7 +194,9 @@ kind: EmbeddingServer
 metadata:
   name: optimizer-embedding
   namespace: toolhive-system
-spec: {}
+spec:
+  # Uncomment for Apple Silicon or other ARM64 platforms
+  # image: ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-latest
 ```
 
 Apply the resource: