diff --git a/docs/cli/Guides/swarm-vllm-s3.md b/docs/cli/Guides/swarm-vllm-s3.md new file mode 100644 index 00000000..5861d465 --- /dev/null +++ b/docs/cli/Guides/swarm-vllm-s3.md @@ -0,0 +1,217 @@ +--- +id: "swarm-vllm-s3" +title: "Super Swarm: LLM Deployment with S3 Storage" +slug: "/guides/swarm-vllm-s3" +sidebar_position: 21 +--- + +This guide provides step-by-step instructions for deploying an LLM on Super Swarm using an S3 object storage, with Qwen2.5 as an example. Modify the deployment script if you want to launch another model. + +## Prerequisites + +- [kubectl](https://kubernetes.io/docs/tasks/tools/) +- [helm](https://helm.sh/docs/intro/install/) +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) +- A domain to construct an API endpoint hostname + +## 1. Download the deployment script + +Download and rename the deployment script [`deploy_qwen_s3.sh`](/files/deploy_qwen_s3.sh). + +In the script, find `BASE_DOMAIN="${BASE_DOMAIN:-superprotocol.com}"` and replace `superprotocol.com` with your domain. + +Modify the deployment configuration and `vllmConfig` if you are deploying another model. + +## 2. Sign in to Super Swarm + +In the Super Swarm dashboard, sign in using either Google (recommended) or MetaMask. + + +
+ +## 3. Create a service account + +**3.1.** Open **Service Accounts** and click **Create Service Account**: + + +
+
+ +**3.2.** Provide a name and click **Create**: + + +
+
+ +**3.3.** Copy and save both Access and Secret keys and click **Done**: + + +
+ +## 4. Create a bucket + +**4.1.** Open **Object Storage** and click **Create Bucket**: + + +
+
+ +**4.2.** Provide a name for the bucket and click **Create Bucket**: + + +
+
+ +## 5. Provide access to the bucket + +**5.1.** In Object Storage, click **Policy Rules**: + + +
+
+ +**5.2.** Click **+Grant Access** in the top-right corner, select a Service Account, and click **Grant Access**: + + +
+ +## 6. Download a model from Hugging Face + +This guide uses Qwen2.5 as an example. If you already have the model, skip this step. + +**6.1.** Install [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/installation). + +**6.2.** Download the model: + +```shell +hf download Qwen/Qwen2.5-1.5B-Instruct --local-dir ./qwen-1.5b +``` + +## 7. Upload the model + +**7.1.** In **Object Storage**, click **Connect Info** to see your S3 Endpoint, Bucket ID, and the region: + + +
+
+ + +
+
+ +**7.2.** Export the following variables to set up the connection: + +```shell +export AWS_ACCESS_KEY_ID="" +export AWS_SECRET_ACCESS_KEY="" +export AWS_DEFAULT_REGION="us-east-1" +export S3_ENDPOINT="" +export S3_BUCKET="" +``` + +Replace: +- `` and `` with the keys you obtained in [Step 3](/cli/guides/swarm-vllm-s3#3-create-a-service-account). +- `` and `` with corresponding values in the **Connect Info**. + +Ensure `AWS_DEFAULT_REGION` matches the region in the **Connect Info**. + +**7.3.** Upload the model: + +```shell +aws s3 sync ./qwen-1.5b s3://${S3_BUCKET}/models/qwen-1.5b/ \ + --endpoint-url ${S3_ENDPOINT} \ + --exclude ".cache/*" +``` + +**7.4.** Check if the model was uploaded successfully: + +```shell +aws s3 ls s3://${S3_BUCKET}/models/qwen-1.5b/ \ + --endpoint-url ${S3_ENDPOINT} +``` + +## 8. Create a Kubernetes cluster + +**8.1.** Go to **Kubernetes** and click **Create Cluster**: + + +
+
+ +**8.2.** Provide a name, add a **GPU** to the cluster, allocate resources, and click **Create Cluster**: + + +
+ +## 9. Download the cluster configuration file + + +
+ +## 10. Point `kubectl` to the configuration file + +Execute the following command: + +```shell +export KUBECONFIG=-kubeconfig.yaml +``` + +Replace `-kubeconfig.yaml` with the name of the cluster configuration file. + +## 11. Set the API key + +Choose a password that will protect your API endpoints. Execute the following command and type your chosen secret (characters won't be displayed): + +```shell +read -rs API_KEY && export API_KEY +``` + +## 12. Deploy the model + +Execute the deployment script: + +```shell +bash deploy_qwen_s3.sh +``` + +## 13. Confirm DNS records + +Back in the Super Swarm dashboard, go to **Ingresses** and check the hostname listed there: + + +
+
+ +At your DNS provider, add a CNAME record pointing to the hostname and a TXT record for domain verification. + +Ensure the statuses have changed to **Verified** and **Delegated**. This may take a couple of minutes. + + +
+ +## 14. Publish the cluster + +Go to **Kubernetes** and publish the cluster. + + +
+ + +## 15. Send a test request + +In the following test request, replace `` with your domain. + +```shell +curl https://qwen-vllm-s3./v1/chat/completions \ + -H "Authorization: Bearer ${API_KEY}" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "qwen", + "messages": [{"role": "user", "content": "Hello! What model are you?"}], + "max_tokens": 100 + }' +``` + +## Support + +If you have any issues or questions, contact Super Protocol on [Discord](https://discord.gg/superprotocol) or via the [contact form](https://superprotocol.zendesk.com/hc/en-us/requests/new). \ No newline at end of file diff --git a/docs/cli/Guides/swarm-vllm.md b/docs/cli/Guides/swarm-vllm.md index 52f4a9ae..0cf02b08 100644 --- a/docs/cli/Guides/swarm-vllm.md +++ b/docs/cli/Guides/swarm-vllm.md @@ -1,50 +1,63 @@ --- id: "swarm-vllm" -title: "vLLM on Super Swarm" +title: "Super Swarm: LLM Deployment" slug: "/guides/swarm-vllm" sidebar_position: 20 --- -This guide provides step-by-step instructions for deploying MedGemma and Apertus on Super Swarm using vLLM. +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +This guide provides step-by-step instructions for deploying an LLM on Super Swarm using [vLLM](https://github.com/vllm-project/vllm), with MedGemma and Apertus as examples. Modify the deployment script if you want to launch another model. ## Prerequisites - [kubectl](https://kubernetes.io/docs/tasks/tools/) - [helm](https://helm.sh/docs/intro/install/) -- A domain +- A domain to construct API endpoint hostnames - For [MedGemma](https://huggingface.co/google/medgemma-1.5-4b-it): a Hugging Face token from an account that has already accepted the model's terms -Also, download and rename deployment scripts: +## 1. Download and update deployment scripts + + + + Download and rename the deployment script [`deploy_medgemma_official.sh`](/files/deploy_medgemma_official.sh) + + + Download and rename the deployment script [`deploy_apertus_official.sh`](/files/deploy_apertus_official.sh) + + + +In the script, find `BASE_DOMAIN="${BASE_DOMAIN:-superprotocol.com}"` and replace `superprotocol.com` with your domain. -- [`deploy_medgemma_official.sh`](/files/deploy_medgemma_official.sh) -- [`deploy_apertus_official.sh`](/files/deploy_apertus_official.sh) +Modify the deployment parameters if you are using another model. -## 1. Sign in to Super Swarm +## 2. Sign in to Super Swarm -In the Super Swarm dashboard, sign in using MetaMask: +In the Super Swarm dashboard, sign in using either Google (recommended) or MetaMask. - +
-## 2. Create a Kubernetes cluster +## 3. Create a Kubernetes cluster -2.1. Go to **Kubernetes** and press **Create Cluster**: +**3.1.** Go to **Kubernetes** and click **Create Cluster**: - +

-2.2. Add a GPU to the cluster, allocate resources, and press **Create Cluster**: +**3.2.** Provide a name, add a **GPU** to the cluster, allocate resources, and click **Create Cluster**: - +
-## 3. Download the cluster configuration file +## 4. Download the cluster configuration file - +
-## 4. Point `kubectl` to the configuration file +## 5. Point `kubectl` to the configuration file Execute the following command: @@ -54,13 +67,9 @@ export KUBECONFIG=-kubeconfig.yaml Replace `-kubeconfig.yaml` with the name of the downloaded configuration file. -## 5. Update the scripts - -In both scripts (`deploy_medgemma_official.sh` and `deploy_apertus_official.sh`), find `BASE_DOMAIN="${BASE_DOMAIN:-monai-swarm.win}"` and replace `monai-swarm.win` with your domain. - ## 6. Set the API key -Choose any password that will protect your API endpoints. Execute the following command and type your chosen secret (characters won't be displayed): +Choose a password that will protect your API endpoints. Execute the following command and type your chosen secret (characters won't be displayed): ```shell read -rs API_KEY && export API_KEY @@ -68,45 +77,44 @@ read -rs API_KEY && export API_KEY ## 7. Deploy the model -### Apertus - -```shell -bash deploy_apertus_official.sh -``` - -The deployment usually takes 5-7 minutes. - -A working Apertus config is already set in the script: - -``` -dtype=bfloat16 -max-model-len=32768 -gpu-memory-utilization=0.55 -max-num-seqs=8 -max-num-batched-tokens=4096 -``` - -### MedGemma - -```shell -export HF_TOKEN=hf_xxx -bash deploy_medgemma_official.sh -``` - -Replace `hf_xxx` with an HF_TOKEN. - -Alternatively, create a `.hf_token` file with the token next to `deploy_medgemma_official.sh`; the script will read it automatically. - -A working MedGemma config is already set in the script: - -``` -dtype=bfloat16 -max-model-len=8192 -gpu-memory-utilization=0.40 ---mm-processor-cache-gb 1 -max-num-seqs=4 -max-num-batched-tokens=2048 -``` + + + ```shell + export HF_TOKEN=hf_xxx + bash deploy_medgemma_official.sh + ``` + + Replace `hf_xxx` with an HF_TOKEN. + + Alternatively, create a `.hf_token` file with the token next to `deploy_medgemma_official.sh`; the script will read it automatically. + + A working MedGemma configuration is already set in the script: + + ``` + dtype=bfloat16 + max-model-len=8192 + gpu-memory-utilization=0.40 + --mm-processor-cache-gb 1 + max-num-seqs=4 + max-num-batched-tokens=2048 + ``` + + + ```shell + bash deploy_apertus_official.sh + ``` + + A working Apertus configuration is already set in the script: + + ``` + dtype=bfloat16 + max-model-len=32768 + gpu-memory-utilization=0.55 + max-num-seqs=8 + max-num-batched-tokens=4096 + ``` + + ## 8. Check Kubernetes @@ -126,58 +134,72 @@ Expected output: Back in the Super Swarm dashboard, go to **Ingresses** and note the two hostnames listed there. - +

For each hostname, add a CNAME record pointing to it and a TXT record for domain verification at your DNS provider. -## 10. Publish the cluster - -In the Super Swarm dashboard, go to **Kubernetes** and publish the cluster. +Back in the Super Swarm dashboard, ensure the statuses are **Verified** and **Delegated**. This may take a couple of minutes. - +
-## 11. Send test requests - -In the test requests below, replace: - -- `` with your domain. -- `` with the key you set in [Step 6](/cli/guides/swarm-vllm#6-set-the-api-key). +## 10. Publish the cluster -### Apertus +Go to **Kubernetes** and publish the cluster. -```shell -curl https://apertus-vllm./v1/completions \ - -H 'Authorization: Bearer ' \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "swiss-ai/Apertus-8B-2509", - "prompt": "Write a concise technical summary of Kubernetes GPU scheduling.", - "temperature": 0, - "max_tokens": 200 - }' -``` + +
-### MedGemma +## 11. Send test requests -```shell -curl https://medgemma-vllm./v1/chat/completions \ - -H 'Authorization: Bearer ' \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "google/medgemma-1.5-4b-it", - "messages": [ - { - "role": "user", - "content": [ - {"type": "text", "text": "Describe this image briefly."}, - {"type": "image_url", "image_url": {"url": "data:image/png;base64,PASTE_BASE64_HERE"}} - ] - } - ], - "temperature": 0, - "max_tokens": 120 - }' -``` \ No newline at end of file + + + In the following test request, replace: + + - `` with your domain. + - `` with a base64-encoded image. To convert an image, use the command: `base64 -i your-image.png`. + + Ensure that `image/png` matches your actual file type; use `image/jpeg` for JPG files, for example. + + ```shell + curl https://medgemma-vllm./v1/chat/completions \ + -H 'Authorization: Bearer ${API_KEY}' \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "google/medgemma-1.5-4b-it", + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "Describe this image briefly."}, + {"type": "image_url", "image_url": {"url": "data:image/png;base64,"}} + ] + } + ], + "temperature": 0, + "max_tokens": 120 + }' + ``` + + + In the following test request, replace `` with your domain. + + ```shell + curl https://apertus-vllm./v1/completions \ + -H 'Authorization: Bearer ${API_KEY}' \ + -H 'Content-Type: application/json' \ + -d '{ + "model": "swiss-ai/Apertus-8B-2509", + "prompt": "Write a concise technical summary of Kubernetes GPU scheduling.", + "temperature": 0, + "max_tokens": 200 + }' + ``` + + + +## Support + +If you have any issues or questions, contact Super Protocol on [Discord](https://discord.gg/superprotocol) or via the [contact form](https://superprotocol.zendesk.com/hc/en-us/requests/new). \ No newline at end of file diff --git a/docs/cli/images/swarm-connect-info.png b/docs/cli/images/swarm-connect-info.png new file mode 100644 index 00000000..0bb18c99 Binary files /dev/null and b/docs/cli/images/swarm-connect-info.png differ diff --git a/docs/cli/images/swarm-create-bucket.png b/docs/cli/images/swarm-create-bucket.png new file mode 100644 index 00000000..a43438aa Binary files /dev/null and b/docs/cli/images/swarm-create-bucket.png differ diff --git a/docs/cli/images/create-kubernetes-space.png b/docs/cli/images/swarm-create-kubernetes-space.png similarity index 100% rename from docs/cli/images/create-kubernetes-space.png rename to docs/cli/images/swarm-create-kubernetes-space.png diff --git a/docs/cli/images/swarm-create-service-account-keys.png b/docs/cli/images/swarm-create-service-account-keys.png new file mode 100644 index 00000000..a495e42d Binary files /dev/null and b/docs/cli/images/swarm-create-service-account-keys.png differ diff --git a/docs/cli/images/swarm-create-service-account-window.png b/docs/cli/images/swarm-create-service-account-window.png new file mode 100644 index 00000000..1f31dff2 Binary files /dev/null and b/docs/cli/images/swarm-create-service-account-window.png differ diff --git a/docs/cli/images/swarm-create-service-account.png b/docs/cli/images/swarm-create-service-account.png new file mode 100644 index 00000000..37aa9739 Binary files /dev/null and b/docs/cli/images/swarm-create-service-account.png differ diff --git a/docs/cli/images/swarm-ingresses-s3-verified.png b/docs/cli/images/swarm-ingresses-s3-verified.png new file mode 100644 index 00000000..878ff977 Binary files /dev/null and b/docs/cli/images/swarm-ingresses-s3-verified.png differ diff --git a/docs/cli/images/swarm-ingresses-s3.png b/docs/cli/images/swarm-ingresses-s3.png new file mode 100644 index 00000000..1625e653 Binary files /dev/null and b/docs/cli/images/swarm-ingresses-s3.png differ diff --git a/docs/cli/images/swarm-ingresses-verified.png b/docs/cli/images/swarm-ingresses-verified.png new file mode 100644 index 00000000..878ff977 Binary files /dev/null and b/docs/cli/images/swarm-ingresses-verified.png differ diff --git a/docs/cli/images/swarm-ingresses-vllm-verified.png b/docs/cli/images/swarm-ingresses-vllm-verified.png new file mode 100644 index 00000000..7551feef Binary files /dev/null and b/docs/cli/images/swarm-ingresses-vllm-verified.png differ diff --git a/docs/cli/images/swarm-ingresses-vllm.png b/docs/cli/images/swarm-ingresses-vllm.png new file mode 100644 index 00000000..eb23ac48 Binary files /dev/null and b/docs/cli/images/swarm-ingresses-vllm.png differ diff --git a/docs/cli/images/ingresses.png b/docs/cli/images/swarm-ingresses.png similarity index 100% rename from docs/cli/images/ingresses.png rename to docs/cli/images/swarm-ingresses.png diff --git a/docs/cli/images/kubernetes-create-cluster.png b/docs/cli/images/swarm-kubernetes-create-cluster.png similarity index 100% rename from docs/cli/images/kubernetes-create-cluster.png rename to docs/cli/images/swarm-kubernetes-create-cluster.png diff --git a/docs/cli/images/kubernetes-download-kubeconfig.png b/docs/cli/images/swarm-kubernetes-download-kubeconfig.png similarity index 100% rename from docs/cli/images/kubernetes-download-kubeconfig.png rename to docs/cli/images/swarm-kubernetes-download-kubeconfig.png diff --git a/docs/cli/images/kubernetes-publish-cluster.png b/docs/cli/images/swarm-kubernetes-publish-cluster.png similarity index 100% rename from docs/cli/images/kubernetes-publish-cluster.png rename to docs/cli/images/swarm-kubernetes-publish-cluster.png diff --git a/docs/cli/images/swarm-log-in.png b/docs/cli/images/swarm-log-in.png deleted file mode 100644 index e7abee2f..00000000 Binary files a/docs/cli/images/swarm-log-in.png and /dev/null differ diff --git a/docs/cli/images/swarm-object-storage-connect-info.png b/docs/cli/images/swarm-object-storage-connect-info.png new file mode 100644 index 00000000..a0ae8e3a Binary files /dev/null and b/docs/cli/images/swarm-object-storage-connect-info.png differ diff --git a/docs/cli/images/swarm-object-storage-policy-rules.png b/docs/cli/images/swarm-object-storage-policy-rules.png new file mode 100644 index 00000000..b6bd7faf Binary files /dev/null and b/docs/cli/images/swarm-object-storage-policy-rules.png differ diff --git a/docs/cli/images/swarm-object-storage.png b/docs/cli/images/swarm-object-storage.png new file mode 100644 index 00000000..76a71e73 Binary files /dev/null and b/docs/cli/images/swarm-object-storage.png differ diff --git a/docs/cli/images/swarm-policy-rules-grant-access.png b/docs/cli/images/swarm-policy-rules-grant-access.png new file mode 100644 index 00000000..bb388350 Binary files /dev/null and b/docs/cli/images/swarm-policy-rules-grant-access.png differ diff --git a/docs/cli/images/swarm-sign-in.png b/docs/cli/images/swarm-sign-in.png new file mode 100644 index 00000000..d73fae84 Binary files /dev/null and b/docs/cli/images/swarm-sign-in.png differ diff --git a/static/files/deploy_apertus_official.sh b/static/files/deploy_apertus_official.sh index 1487a1c7..333dcde1 100755 --- a/static/files/deploy_apertus_official.sh +++ b/static/files/deploy_apertus_official.sh @@ -3,7 +3,7 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -BASE_DOMAIN="${BASE_DOMAIN:-monai-swarm.win}" +BASE_DOMAIN="${BASE_DOMAIN:-superprotocol.com}" API_HOST="${API_HOST:-apertus-vllm.${BASE_DOMAIN}}" MODEL_NAME="${MODEL_NAME:-swiss-ai/Apertus-8B-2509}" MODEL_ENTRY_NAME="${MODEL_ENTRY_NAME:-apertus}" diff --git a/static/files/deploy_medgemma_official.sh b/static/files/deploy_medgemma_official.sh index 7845a04e..4cc0bc05 100755 --- a/static/files/deploy_medgemma_official.sh +++ b/static/files/deploy_medgemma_official.sh @@ -3,7 +3,7 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -BASE_DOMAIN="${BASE_DOMAIN:-monai-swarm.win}" +BASE_DOMAIN="${BASE_DOMAIN:-superprotocol.com}" API_HOST="${API_HOST:-medgemma-vllm.${BASE_DOMAIN}}" MODEL_NAME="${MODEL_NAME:-google/medgemma-1.5-4b-it}" MODEL_ENTRY_NAME="${MODEL_ENTRY_NAME:-medgemma}" diff --git a/static/files/deploy_qwen_s3.sh b/static/files/deploy_qwen_s3.sh new file mode 100644 index 00000000..8de83c74 --- /dev/null +++ b/static/files/deploy_qwen_s3.sh @@ -0,0 +1,235 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# =========================== +# VALIDATE REQUIRED VARS +# =========================== +if [ -z "${API_KEY:-}" ]; then + echo "ERROR: API_KEY must be set. Execute:" >&2 + echo " read -rs API_KEY && export API_KEY" >&2 + exit 1 +fi + +if [ -z "${AWS_ACCESS_KEY_ID:-}" ] || [ -z "${AWS_SECRET_ACCESS_KEY:-}" ]; then + echo "ERROR: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY must be set." >&2 + echo " export AWS_ACCESS_KEY_ID=" >&2 + echo " export AWS_SECRET_ACCESS_KEY=" >&2 + exit 1 +fi + +if [ -z "${S3_ENDPOINT:-}" ] || [ -z "${S3_BUCKET:-}" ]; then + echo "ERROR: S3_ENDPOINT and S3_BUCKET must be set." >&2 + echo " export S3_ENDPOINT=" >&2 + echo " export S3_BUCKET=" >&2 + exit 1 +fi + +S3_MODEL_PATH="${S3_MODEL_PATH:-models/qwen-1.5b}" + +# =========================== +# DEPLOYMENT CONFIG +# =========================== +BASE_DOMAIN="${BASE_DOMAIN:-superprotocol.com}" +API_HOST="${API_HOST:-qwen-vllm-s3.${BASE_DOMAIN}}" +MODEL_NAME="s3://${S3_BUCKET}/${S3_MODEL_PATH}" +MODEL_ENTRY_NAME="${MODEL_ENTRY_NAME:-qwen}" +RELEASE_NAME="${RELEASE_NAME:-vllm-s3}" +IMAGE_REPOSITORY="${IMAGE_REPOSITORY:-vllm/vllm-openai}" +IMAGE_TAG="${IMAGE_TAG:-v0.8.5}" +GPU_MEMORY_UTILIZATION="${GPU_MEMORY_UTILIZATION:-0.85}" +MAX_MODEL_LEN="${MAX_MODEL_LEN:-4096}" +CPU_REQUEST="${CPU_REQUEST:-4}" +MEMORY_REQUEST="${MEMORY_REQUEST:-16Gi}" +GPU_COUNT="${GPU_COUNT:-1}" +PVC_STORAGE="${PVC_STORAGE:-10Gi}" +INGRESS_CLASS="${INGRESS_CLASS:-nginx}" + +need() { command -v "$1" >/dev/null 2>&1 || { echo "Missing dependency: $1" >&2; exit 1; }; } +need kubectl +need helm + +NAMESPACE="${NAMESPACE:-$(kubectl config view --minify -o jsonpath='{..namespace}' 2>/dev/null || true)}" +if [ -z "${NAMESPACE}" ]; then + NAMESPACE="llm" +fi + +SECRET_NAME="${RELEASE_NAME}-auth" +S3_SECRET_NAME="${RELEASE_NAME}-s3-creds" +SERVICE_NAME="${RELEASE_NAME}-${MODEL_ENTRY_NAME}-engine-service" +INGRESS_NAME="${RELEASE_NAME}-api-ingress" + +echo "==> Runtime: vLLM (official helm chart) + S3 model" +echo "==> Namespace: ${NAMESPACE}" +echo "==> Release: ${RELEASE_NAME}" +echo "==> API host: ${API_HOST}" +echo "==> Model (S3): ${MODEL_NAME}" +echo "==> S3 endpoint: ${S3_ENDPOINT}" +echo "==> Image: ${IMAGE_REPOSITORY}:${IMAGE_TAG}" +echo + +kubectl get ns "${NAMESPACE}" >/dev/null 2>&1 || kubectl create ns "${NAMESPACE}" + +helm repo add vllm https://vllm-project.github.io/production-stack >/dev/null 2>&1 || true +helm repo update >/dev/null 2>&1 + +# API key secret +cat < "${VALUES_FILE}" < Values file:" +cat "${VALUES_FILE}" +echo + +KUBECONFIG="${KUBECONFIG:-}" helm upgrade --install "${RELEASE_NAME}" vllm/vllm-stack \ + --namespace "${NAMESPACE}" \ + -f "${VALUES_FILE}" \ + --skip-crds \ + --wait --timeout=20m + +cat < Pods:" +kubectl -n "${NAMESPACE}" get pods -o wide +echo +echo "==> Services:" +kubectl -n "${NAMESPACE}" get svc -o wide +echo +echo "==> Ingress:" +kubectl -n "${NAMESPACE}" get ingress -o wide +echo +echo "==> Waiting for vLLM pod readiness..." +kubectl -n "${NAMESPACE}" wait --for=condition=ready pod \ + -l "model=${MODEL_ENTRY_NAME},helm-release-name=${RELEASE_NAME}" \ + --timeout=900s +echo +echo "===================================" +echo "Ready. API base URL: http://${API_HOST}/v1" +echo "Model: ${MODEL_NAME}" +echo "Smoke test:" +echo " curl http://${API_HOST}/v1/models -H 'Authorization: Bearer \${API_KEY}'" +echo "==================================="