diff --git a/serverless/endpoints/endpoint-configurations.mdx b/serverless/endpoints/endpoint-configurations.mdx index e99692dc..fa428e9f 100644 --- a/serverless/endpoints/endpoint-configurations.mdx +++ b/serverless/endpoints/endpoint-configurations.mdx @@ -77,6 +77,16 @@ Number of GPUs assigned to each worker instance. Default is 1. Generally priorit How long a worker stays active after completing a request before shutting down. You're billed during idle time, but the worker remains warm for immediate processing. Default: 5 seconds. +### Idle endpoint scale-down + +Runpod automatically scales down endpoints that go a long time without any requests, so unused endpoints don't keep consuming your account balance. +* After 3 days with no requests, the endpoint's max workers is reduced to 2 and Runpod sends you an email notification. +* After 7 days with no requests, max workers is set to 0. + +This scale-down is automatic and system-driven, and the timer is based on request activity, so any incoming request resets it. + +Once an endpoint has been scaled down this way, it stays at its reduced max workers until you raise the value yourself. To use the endpoint again, increase its max workers in the Runpod console. To prevent an endpoint from scaling down in the first place, make sure it continues to receive requests. + ### Execution timeout Maximum duration for a single job. When exceeded, the job fails and the worker stops. Keep enabled to prevent runaway jobs. Default: 600s (10 min). Range: 5s to 7 days. diff --git a/serverless/troubleshooting.mdx b/serverless/troubleshooting.mdx index d55a91d3..0956bf66 100644 --- a/serverless/troubleshooting.mdx +++ b/serverless/troubleshooting.mdx @@ -51,6 +51,17 @@ Check the job status response for error details. Common causes: - **OOM (Out of Memory)**: Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU. - **Timeout**: Job exceeded execution timeout. Increase timeout or optimize processing. +## Endpoint scaling issues + +### My endpoint was scaled down unexpectedly + +If your endpoint's max workers dropped without any change on your end, Runpod scaled the endpoint down automatically. This happens in two situations: + +- **Prolonged inactivity**: When an endpoint receives no requests for 3 days, its max workers is reduced to 2, and after 7 days its max workers is set to 0. Runpod emails you when the first reduction happens. For more details, see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). +- **Repeated unhealthy workers**: When an endpoint consistently produces unhealthy (crashing) workers, Runpod scales it down to stop billing and reduce thrashing, and sends you an email. + +To bring the endpoint back, increase its max workers in the [Runpod console](https://www.console.runpod.io/serverless). If the scale-down was caused by unhealthy workers, fix the underlying problem first, or the endpoint may be scaled down again. Check the [logs](/serverless/development/logs) for crash errors, and verify your worker using [local testing](/serverless/development/local-testing). + ## Cold start issues ### Slow cold starts diff --git a/serverless/workers/deploy.mdx b/serverless/workers/deploy.mdx index de53fd83..2e4fd149 100644 --- a/serverless/workers/deploy.mdx +++ b/serverless/workers/deploy.mdx @@ -76,6 +76,8 @@ Versioning best practices: * Document the specific image version or SHA in your deployment documentation. * Keep images as small as possible for faster startup times. +Runpod validates your image name when you create an endpoint, and a reference that doesn't resolve to a published image is rejected at creation time. For example, `runpod/pytorch:latest` is rejected because `:latest` is not a published tag for Runpod's base images. When you deploy a Runpod base image, specify a tag that actually exists. You can browse the available tags for each image on [Docker Hub](https://hub.docker.com/u/runpod). For your own worker images, use a specific version or SHA tag as described above rather than `:latest`. + ## Deploy an endpoint diff --git a/serverless/workers/overview.mdx b/serverless/workers/overview.mdx index a2d22639..b6005c0d 100644 --- a/serverless/workers/overview.mdx +++ b/serverless/workers/overview.mdx @@ -56,6 +56,10 @@ The system may also spin up **extra workers** during traffic spikes when Docker | **Outdated** | Marked for replacement after update | Yes (while processing) | | **Unhealthy** | Crashed; auto-retries for up to 7 days | No | + +If an endpoint repeatedly produces unhealthy workers, Runpod automatically scales it down; see [My endpoint was scaled down unexpectedly](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). + + View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless). ## Max worker limits