Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .claude/skills/test-docs-dryrun/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: test-docs-dryrun
description: >
Fast CRD schema validation for ToolHive documentation. Extracts all YAML blocks from K8s and vMCP docs, runs kubectl apply --dry-run=server to catch field name errors, type mismatches, and schema drift. No cluster resources are created. Use for: "dry-run the docs", "validate the YAML", "check for schema issues", "run a quick doc check", or after any CRD/API change to catch doc rot. Requires a Kubernetes cluster with ToolHive CRDs installed.
---

# Test Docs - Dry Run

Fast CRD schema validation for all ToolHive documentation YAML blocks. Extracts every YAML block containing `toolhive.stacklok.dev` resources, runs `kubectl apply --dry-run=server` on each, and reports pass/fail in a per-doc table. No resources are created. Under 5 minutes for all docs.

## Workflow

Follow these steps in order. Each step references a procedure file in the `procedures/` directory - read that file and follow its instructions exactly. Do NOT improvise inline bash when a procedure exists.

### 1. Determine scope

Ask what to validate:

- **Kubernetes Operator** - all `.mdx` in `docs/toolhive/guides-k8s/`
- **Virtual MCP Server** - all `.mdx` in `docs/toolhive/guides-vmcp/`
- **All** - both sections

### 2. Cluster setup

Follow `procedures/cluster-setup.md`. Key points:

- Check for existing `toolhive` kind cluster FIRST
- Only create if missing
- Only CRDs needed, not the operator
- Default to KEEPING the cluster after the run

### 3. Extract YAML

Follow `procedures/extract.md`. Key points:

- Always use `scripts/extract-yaml.py` with the `--prefix` flag
- Use `--prefix k8s` for K8s docs, `--prefix vmcp` for vMCP docs
- This prevents filename collisions between sections

### 4. Validate

Follow `procedures/validate.md`. Key points:

- Write results to a CSV file (not bash variables)
- Classify as pass/fail/expected/skip per the procedure
- One CSV line per YAML block

### 5. Report

Follow `procedures/report.md`. Key points:

- Always output the per-doc breakdown table to the user
- The table is the PRIMARY output of this skill
- Bold non-zero fail counts
- List real failures with error messages and fix suggestions
- Write results to `TEST_DRYRUN_*.md`
42 changes: 42 additions & 0 deletions .claude/skills/test-docs-dryrun/procedures/cluster-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Cluster setup

How to manage the kind cluster for dry-run validation. The goal is to avoid unnecessary cluster creation/deletion between runs.

## Check for existing cluster

Always check first. Do NOT create a cluster without checking.

```bash
kind get clusters 2>/dev/null | grep -q "^toolhive$"
```

If the cluster exists, verify CRDs are installed:

```bash
kubectl get crd --context kind-toolhive 2>/dev/null | grep -c toolhive.stacklok.dev
```

If CRDs are present (count > 0), skip straight to extraction. No setup needed.

## Create cluster only if missing

If no `toolhive` cluster exists:

```bash
kind create cluster --name toolhive
helm upgrade --install toolhive-operator-crds \
oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds \
-n toolhive-system --create-namespace
```

The operator is NOT needed for dry-run validation. Only install CRDs.

## After the run: keep the cluster

Default to keeping the cluster after the run. Do NOT delete it unless the user explicitly asks. This avoids the 2-minute cluster creation penalty on the next run.

If the user asks to clean up, delete with:

```bash
kind delete cluster --name toolhive
```
61 changes: 61 additions & 0 deletions .claude/skills/test-docs-dryrun/procedures/extract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Extract YAML blocks

How to extract ToolHive CRD YAML blocks from documentation files. Always use the extraction script - do NOT write inline bash to parse YAML.

## Commands

Use the `--prefix` flag to avoid filename collisions between sections. This is critical because both sections have files with the same name (e.g., `telemetry-and-metrics.mdx`).

```bash
SKILL_PATH="<skill-path>"
YAML_DIR="$(mktemp -d)/yaml"
mkdir -p "$YAML_DIR"

# K8s Operator docs
for f in docs/toolhive/guides-k8s/*.mdx; do
python3 "$SKILL_PATH/scripts/extract-yaml.py" "$f" "$YAML_DIR" --prefix k8s
done

# vMCP docs
for f in docs/toolhive/guides-vmcp/*.mdx; do
python3 "$SKILL_PATH/scripts/extract-yaml.py" "$f" "$YAML_DIR" --prefix vmcp
done
```

Replace `<skill-path>` with the absolute path to this skill's directory.

## Output

The script writes one file per YAML block:

```text
k8s-auth-k8s_0.yaml
k8s-auth-k8s_1.yaml
k8s-run-mcp-k8s_0.yaml
vmcp-authentication_0.yaml
vmcp-telemetry-and-metrics_0.yaml
```

The prefix ensures `k8s-telemetry-and-metrics_0.yaml` and `vmcp-telemetry-and-metrics_0.yaml` don't collide.

## For single section

If only validating one section, use just the relevant loop:

```bash
# K8s only
for f in docs/toolhive/guides-k8s/*.mdx; do
python3 "$SKILL_PATH/scripts/extract-yaml.py" "$f" "$YAML_DIR" --prefix k8s
done

# vMCP only
for f in docs/toolhive/guides-vmcp/*.mdx; do
python3 "$SKILL_PATH/scripts/extract-yaml.py" "$f" "$YAML_DIR" --prefix vmcp
done
```

## For single page

```bash
python3 "$SKILL_PATH/scripts/extract-yaml.py" <file-path> "$YAML_DIR" --prefix single
```
75 changes: 75 additions & 0 deletions .claude/skills/test-docs-dryrun/procedures/report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Report results

How to produce the per-doc breakdown table from the results CSV.

## Build the table from CSV

The validate step writes a CSV at `$RESULTS` with columns: `doc,filename,result,detail`

Aggregate it into the per-doc table using this bash:

```bash
RESULTS="<path-to-results.csv>"

echo "| Doc | Section | Blocks | Pass | Fail | Expected | Skip |"
echo "|-----|---------|--------|------|------|----------|------|"

TBLOCKS=0; TPASS=0; TFAIL=0; TEXPECTED=0; TSKIP=0

for doc in $(cut -d',' -f1 "$RESULTS" | sort -u); do
blocks=$(grep -c "^$doc," "$RESULTS")
pass=$(grep -c "^$doc,[^,]*,pass" "$RESULTS" || true)
fail=$(grep -c "^$doc,[^,]*,fail" "$RESULTS" || true)
expected=$(grep -c "^$doc,[^,]*,expected" "$RESULTS" || true)
skip=$(grep -c "^$doc,[^,]*,skip" "$RESULTS" || true)

# Determine section from prefix
section="K8s"
echo "$doc" | grep -q "^vmcp-" && section="vMCP"

# Bold non-zero fail counts
fail_display="$fail"
[ "$fail" -gt 0 ] && fail_display="**$fail**"

echo "| ${doc} | ${section} | ${blocks} | ${pass} | ${fail_display} | ${expected} | ${skip} |"

TBLOCKS=$((TBLOCKS + blocks))
TPASS=$((TPASS + pass))
TFAIL=$((TFAIL + fail))
TEXPECTED=$((TEXPECTED + expected))
TSKIP=$((TSKIP + skip))
done

echo "| **TOTAL** | | **$TBLOCKS** | **$TPASS** | **$TFAIL** | **$TEXPECTED** | **$TSKIP** |"
```

## Table rules

- Include every doc that had at least 1 YAML block extracted
- Omit docs with 0 blocks
- Group K8s docs first (prefix `k8s-`), then vMCP docs (prefix `vmcp-`)
- Sort alphabetically within each section
- Bold the TOTAL row
- Bold any non-zero Fail count

## After the table

List each real failure with:

```text
### Real failures

**k8s-mcp-server-entry_1.yaml**: `strict decoding error: unknown field "spec.remoteURL"`
Fix: Change `remoteURL` to `remoteUrl`

**vmcp-optimizer_3.yaml**: `strict decoding error: unknown field "spec.modelCache.storageSize"`
Fix: Change `storageSize` to `size`
```

## Write to file

Always write the table and failure details to a markdown file:

- Single page: `TEST_DRYRUN_<PAGE_NAME>.md`
- Section: `TEST_DRYRUN_<SECTION_NAME>.md`
- All: `TEST_DRYRUN_ALL.md`
76 changes: 76 additions & 0 deletions .claude/skills/test-docs-dryrun/procedures/validate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Validate YAML blocks

How to dry-run validate extracted YAML blocks efficiently.

## Important: use a script file, not inline bash

Long-running `for` loops with nested `if/elif/else` and subshells do NOT work reliably as inline Bash tool calls - they get backgrounded or time out. Always write the validation loop to a temporary `.sh` file and execute it with `bash`.

## Step 1: Write the validation script

Write the following to a temp file (e.g., `$YAML_DIR/../validate.sh`). Replace `<yaml-dir>` with the actual YAML directory path.

```bash
#!/bin/bash
set -euo pipefail

YAML_DIR="<yaml-dir>"
RESULTS="$YAML_DIR/../results.csv"
> "$RESULTS"

for f in "$YAML_DIR"/*.yaml; do
bn=$(basename "$f")
doc=$(echo "$bn" | sed 's/_[0-9]*\.yaml$//')

# Skip incomplete resources (snippets without kind/apiVersion/name)
if ! grep -q 'kind:' "$f" || ! grep -q 'apiVersion:' "$f" || ! grep -q 'name:' "$f"; then
echo "$doc,$bn,skip,incomplete fragment" >> "$RESULTS"
continue
fi

OUT=$(kubectl apply --dry-run=server -f "$f" 2>&1) || true

if echo "$OUT" | grep -q 'created (server dry run)'; then
echo "$doc,$bn,pass," >> "$RESULTS"
elif echo "$OUT" | grep -q 'namespaces.*not found'; then
echo "$doc,$bn,pass,namespace missing but schema valid" >> "$RESULTS"
elif echo "$OUT" | grep -qE '<SERVER_NAME>|<NAMESPACE>|<SERVER_URL>'; then
echo "$doc,$bn,expected,placeholder name" >> "$RESULTS"
elif echo "$OUT" | grep -qE 'incomingAuth: Required|compositeTools.*steps: Required'; then
echo "$doc,$bn,expected,partial snippet" >> "$RESULTS"
else
ERR=$(echo "$OUT" | tr '\n' ' ' | head -c 200)
echo "$doc,$bn,fail,$ERR" >> "$RESULTS"
fi
done

echo "Done: $(wc -l < "$RESULTS") lines"
```

## Step 2: Run the script

Use the Write tool to create the script file, then execute it:

```bash
bash $YAML_DIR/../validate.sh
```

This runs in a single process and completes reliably regardless of how many YAML blocks are validated.

## Result classification

| Result | Meaning | Action needed? |
| ---------- | ---------------------------------------- | ----------------- |
| `pass` | Schema valid (or only namespace missing) | No |
| `expected` | Placeholder name or partial snippet | No |
| `fail` | Real schema error | Yes - fix the doc |
| `skip` | Incomplete fragment (no kind/name) | No |

## Important notes

- ALWAYS write to a script file first, then execute with `bash`
- Use `|| true` after `kubectl apply` to prevent `set -e` from exiting on validation failures
- Check for `created (server dry run)` in output rather than exit code, because some versions of kubectl return non-zero even on dry-run success with warnings
- Use a CSV file for results, not bash variables or associative arrays
- Write one line per YAML block with: doc name, filename, result, detail
- The CSV is consumed by the report procedure to build the table
76 changes: 76 additions & 0 deletions .claude/skills/test-docs-dryrun/scripts/check-prereqs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
# SPDX-License-Identifier: Apache-2.0

# Check prerequisites for dry-run documentation validation.
# Only requires kubectl and a cluster with CRDs installed (no operator needed).

set -euo pipefail

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m'

ERRORS=0
WARNINGS=0

echo "=== Dry-Run Validation Prerequisites ==="
echo ""

# Check kubectl
if command -v kubectl &>/dev/null; then
echo -e "${GREEN}[OK]${NC} kubectl available"
else
echo -e "${RED}[MISSING]${NC} kubectl (required)"
ERRORS=$((ERRORS + 1))
fi

# Check cluster connectivity
echo ""
echo "--- Kubernetes cluster ---"
CLUSTER_REACHABLE=false
if kubectl cluster-info &>/dev/null; then
CONTEXT=$(kubectl config current-context 2>/dev/null || echo "unknown")
echo -e "${GREEN}[OK]${NC} Cluster reachable (context: $CONTEXT)"
CLUSTER_REACHABLE=true
else
echo -e "${RED}[MISSING]${NC} No Kubernetes cluster reachable"
ERRORS=$((ERRORS + 1))
fi

# Check ToolHive CRDs installed (only if cluster is reachable)
echo ""
echo "--- ToolHive CRDs ---"
if [ "$CLUSTER_REACHABLE" = true ]; then
CRD_COUNT=$(kubectl get crd 2>/dev/null | grep -c toolhive.stacklok.dev || true)
if [ "$CRD_COUNT" -gt 0 ]; then
echo -e "${GREEN}[OK]${NC} $CRD_COUNT ToolHive CRDs installed"
else
echo -e "${RED}[MISSING]${NC} No ToolHive CRDs found"
echo " Install with: helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds -n toolhive-system --create-namespace"
ERRORS=$((ERRORS + 1))
fi
else
echo -e "${YELLOW}[SKIPPED]${NC} Cannot check CRDs (cluster unreachable)"
fi
Comment on lines +33 to +56
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD detection is run even when the cluster is unreachable; in that case CRD_COUNT becomes 0 and the script reports "No ToolHive CRDs found" with a Helm install hint, which is misleading because the real issue is cluster connectivity. Gate the CRD check on successful cluster connectivity (or detect and report connectivity errors separately) so the output points to the correct fix.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - fixed in 6d71bc2. The CRD check is now gated on a CLUSTER_REACHABLE flag. When the cluster is unreachable, it shows [SKIPPED] Cannot check CRDs (cluster unreachable) instead of the misleading Helm install hint.


# Check python3 for extraction script
echo ""
echo "--- Tools ---"
if command -v python3 &>/dev/null; then
echo -e "${GREEN}[OK]${NC} python3 available"
else
echo -e "${RED}[MISSING]${NC} python3 (required for YAML extraction)"
ERRORS=$((ERRORS + 1))
fi

echo ""
echo "=== Summary ==="
if [ "$ERRORS" -gt 0 ]; then
echo -e "${RED}$ERRORS error(s)${NC}"
exit 1
else
echo -e "${GREEN}All prerequisites met${NC}"
exit 0
fi
Loading