From c901e7ee47b411e340769235b66ed31027a444c7 Mon Sep 17 00:00:00 2001 From: "promptless[bot]" Date: Wed, 24 Jun 2026 20:15:50 +0000 Subject: [PATCH 1/5] Document idle/unhealthy endpoint auto scale-down and image name validation - Add idle endpoint scale-down policy (3d->2 workers+email, 7d->0) to endpoint settings - Note long-term idle and unhealthy auto scale-down in worker states table - Update Unhealthy worker state row to reflect auto scale-down behavior - Add troubleshooting entry for unexpected endpoint scale-down - Document image name validation at endpoint creation (runpod/*:latest rejected) - Add June 2026 release note for automatic endpoint scale-down Refs DOCS-452 (SLS-7, SLS-121, SLS-8, SLS-238) --- release-notes.mdx | 10 ++++++++++ serverless/endpoints/endpoint-configurations.mdx | 6 ++++++ serverless/troubleshooting.mdx | 11 +++++++++++ serverless/workers/deploy.mdx | 2 ++ serverless/workers/overview.mdx | 6 +++++- 5 files changed, 34 insertions(+), 1 deletion(-) diff --git a/release-notes.mdx b/release-notes.mdx index 12f832ce..dd64ac24 100644 --- a/release-notes.mdx +++ b/release-notes.mdx @@ -4,6 +4,16 @@ sidebarTitle: "Product updates" description: "New features, fixes, and improvements for the Runpod platform." --- + +## Automatic scale-down for idle and unhealthy Serverless endpoints + +Serverless endpoints now scale down automatically in two situations to prevent unnecessary charges: + +- **Idle endpoints**: An endpoint that receives no requests for 3 days has its max workers reduced to 2 (with an email notification), and after 7 days its max workers is set to 0. Increase max workers manually to resume an idle endpoint. See [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). +- **Repeated unhealthy workers**: An endpoint that consistently produces unhealthy workers is scaled down automatically to stop billing and reduce thrashing, and Runpod sends you an email. See [troubleshooting](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). + + + ## Flash beta: Run Python functions on cloud GPUs diff --git a/serverless/endpoints/endpoint-configurations.mdx b/serverless/endpoints/endpoint-configurations.mdx index e99692dc..e8975d2e 100644 --- a/serverless/endpoints/endpoint-configurations.mdx +++ b/serverless/endpoints/endpoint-configurations.mdx @@ -77,6 +77,12 @@ Number of GPUs assigned to each worker instance. Default is 1. Generally priorit How long a worker stays active after completing a request before shutting down. You're billed during idle time, but the worker remains warm for immediate processing. Default: 5 seconds. +### Idle endpoint scale-down + +Separately from the per-request idle timeout above, Runpod automatically scales down endpoints that go a long time without any requests, so unused endpoints don't keep consuming your account balance. After 3 days with no requests, the endpoint's max workers is reduced to 2 and Runpod sends you an email notification. After 7 days with no requests, max workers is set to 0. This scale-down is automatic and system-driven, and the timer is based on request activity, so any incoming request resets it. + +Once an endpoint has been scaled down this way, it stays at its reduced max workers until you raise the value yourself. To use the endpoint again, increase its max workers in the Runpod console. To prevent an endpoint from scaling down in the first place, make sure it continues to receive requests. + ### Execution timeout Maximum duration for a single job. When exceeded, the job fails and the worker stops. Keep enabled to prevent runaway jobs. Default: 600s (10 min). Range: 5s to 7 days. diff --git a/serverless/troubleshooting.mdx b/serverless/troubleshooting.mdx index d55a91d3..0956bf66 100644 --- a/serverless/troubleshooting.mdx +++ b/serverless/troubleshooting.mdx @@ -51,6 +51,17 @@ Check the job status response for error details. Common causes: - **OOM (Out of Memory)**: Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU. - **Timeout**: Job exceeded execution timeout. Increase timeout or optimize processing. +## Endpoint scaling issues + +### My endpoint was scaled down unexpectedly + +If your endpoint's max workers dropped without any change on your end, Runpod scaled the endpoint down automatically. This happens in two situations: + +- **Prolonged inactivity**: When an endpoint receives no requests for 3 days, its max workers is reduced to 2, and after 7 days its max workers is set to 0. Runpod emails you when the first reduction happens. For more details, see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). +- **Repeated unhealthy workers**: When an endpoint consistently produces unhealthy (crashing) workers, Runpod scales it down to stop billing and reduce thrashing, and sends you an email. + +To bring the endpoint back, increase its max workers in the [Runpod console](https://www.console.runpod.io/serverless). If the scale-down was caused by unhealthy workers, fix the underlying problem first, or the endpoint may be scaled down again. Check the [logs](/serverless/development/logs) for crash errors, and verify your worker using [local testing](/serverless/development/local-testing). + ## Cold start issues ### Slow cold starts diff --git a/serverless/workers/deploy.mdx b/serverless/workers/deploy.mdx index de53fd83..2e4fd149 100644 --- a/serverless/workers/deploy.mdx +++ b/serverless/workers/deploy.mdx @@ -76,6 +76,8 @@ Versioning best practices: * Document the specific image version or SHA in your deployment documentation. * Keep images as small as possible for faster startup times. +Runpod validates your image name when you create an endpoint, and a reference that doesn't resolve to a published image is rejected at creation time. For example, `runpod/pytorch:latest` is rejected because `:latest` is not a published tag for Runpod's base images. When you deploy a Runpod base image, specify a tag that actually exists. You can browse the available tags for each image on [Docker Hub](https://hub.docker.com/u/runpod). For your own worker images, use a specific version or SHA tag as described above rather than `:latest`. + ## Deploy an endpoint diff --git a/serverless/workers/overview.mdx b/serverless/workers/overview.mdx index a2d22639..033c475e 100644 --- a/serverless/workers/overview.mdx +++ b/serverless/workers/overview.mdx @@ -54,10 +54,14 @@ The system may also spin up **extra workers** during traffic spikes when Docker | **Running** | Processing requests | Yes | | **Throttled** | Temporarily unable to run due to host resource constraints | No | | **Outdated** | Marked for replacement after update | Yes (while processing) | -| **Unhealthy** | Crashed; auto-retries for up to 7 days | No | +| **Unhealthy** | Crashed; auto-retries, then scales the endpoint down after repeated failures (email sent) | No | View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless). + +Runpod also adjusts endpoints automatically based on long-term activity and health. An endpoint that receives no requests for 3 days has its max workers reduced to 2 (you'll get an email), and after 7 days its max workers is set to 0; see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). An endpoint that consistently produces unhealthy workers is also scaled down to stop billing and reduce thrashing, with an email notification; see [My endpoint was scaled down unexpectedly](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). In both cases, increase max workers manually to bring the endpoint back. + + ## Max worker limits Account balance determines your maximum workers (flex + active combined): From afaee1aa8f2f0d84e825a752019f24aeb03348aa Mon Sep 17 00:00:00 2001 From: "promptless[bot]" Date: Wed, 24 Jun 2026 20:28:36 +0000 Subject: [PATCH 2/5] docs: restore Unhealthy worker state table text per reviewer feedback Revert the table-text replacement for the Unhealthy worker state and keep the unhealthy scale-down behavior described in the note after the table. --- serverless/workers/overview.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/serverless/workers/overview.mdx b/serverless/workers/overview.mdx index 033c475e..f7e2ca4e 100644 --- a/serverless/workers/overview.mdx +++ b/serverless/workers/overview.mdx @@ -54,7 +54,7 @@ The system may also spin up **extra workers** during traffic spikes when Docker | **Running** | Processing requests | Yes | | **Throttled** | Temporarily unable to run due to host resource constraints | No | | **Outdated** | Marked for replacement after update | Yes (while processing) | -| **Unhealthy** | Crashed; auto-retries, then scales the endpoint down after repeated failures (email sent) | No | +| **Unhealthy** | Crashed; auto-retries for up to 7 days | No | View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless). From 331fd5d7b21791a8e85c039baecae347fab26a06 Mon Sep 17 00:00:00 2001 From: "promptless[bot]" Date: Wed, 24 Jun 2026 20:30:40 +0000 Subject: [PATCH 3/5] docs: remove release-notes/changelog addition per reviewer feedback Reviewer requested no changelog entry for now; keep all other Serverless documentation updates intact. --- release-notes.mdx | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/release-notes.mdx b/release-notes.mdx index dd64ac24..12f832ce 100644 --- a/release-notes.mdx +++ b/release-notes.mdx @@ -4,16 +4,6 @@ sidebarTitle: "Product updates" description: "New features, fixes, and improvements for the Runpod platform." --- - -## Automatic scale-down for idle and unhealthy Serverless endpoints - -Serverless endpoints now scale down automatically in two situations to prevent unnecessary charges: - -- **Idle endpoints**: An endpoint that receives no requests for 3 days has its max workers reduced to 2 (with an email notification), and after 7 days its max workers is set to 0. Increase max workers manually to resume an idle endpoint. See [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). -- **Repeated unhealthy workers**: An endpoint that consistently produces unhealthy workers is scaled down automatically to stop billing and reduce thrashing, and Runpod sends you an email. See [troubleshooting](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). - - - ## Flash beta: Run Python functions on cloud GPUs From ff84fb4564a0cecdb80dfebcd140993927633a96 Mon Sep 17 00:00:00 2001 From: lgunreddi Date: Wed, 24 Jun 2026 16:38:56 -0400 Subject: [PATCH 4/5] Update endpoint-configurations.mdx --- serverless/endpoints/endpoint-configurations.mdx | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/serverless/endpoints/endpoint-configurations.mdx b/serverless/endpoints/endpoint-configurations.mdx index e8975d2e..fa428e9f 100644 --- a/serverless/endpoints/endpoint-configurations.mdx +++ b/serverless/endpoints/endpoint-configurations.mdx @@ -79,7 +79,11 @@ How long a worker stays active after completing a request before shutting down. ### Idle endpoint scale-down -Separately from the per-request idle timeout above, Runpod automatically scales down endpoints that go a long time without any requests, so unused endpoints don't keep consuming your account balance. After 3 days with no requests, the endpoint's max workers is reduced to 2 and Runpod sends you an email notification. After 7 days with no requests, max workers is set to 0. This scale-down is automatic and system-driven, and the timer is based on request activity, so any incoming request resets it. +Runpod automatically scales down endpoints that go a long time without any requests, so unused endpoints don't keep consuming your account balance. +* After 3 days with no requests, the endpoint's max workers is reduced to 2 and Runpod sends you an email notification. +* After 7 days with no requests, max workers is set to 0. + +This scale-down is automatic and system-driven, and the timer is based on request activity, so any incoming request resets it. Once an endpoint has been scaled down this way, it stays at its reduced max workers until you raise the value yourself. To use the endpoint again, increase its max workers in the Runpod console. To prevent an endpoint from scaling down in the first place, make sure it continues to receive requests. From 9ccf79cf3eb856705b3177c370c8c6ec464cc489 Mon Sep 17 00:00:00 2001 From: lgunreddi Date: Wed, 24 Jun 2026 16:44:23 -0400 Subject: [PATCH 5/5] Update overview.mdx --- serverless/workers/overview.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/serverless/workers/overview.mdx b/serverless/workers/overview.mdx index f7e2ca4e..b6005c0d 100644 --- a/serverless/workers/overview.mdx +++ b/serverless/workers/overview.mdx @@ -56,12 +56,12 @@ The system may also spin up **extra workers** during traffic spikes when Docker | **Outdated** | Marked for replacement after update | Yes (while processing) | | **Unhealthy** | Crashed; auto-retries for up to 7 days | No | -View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless). - -Runpod also adjusts endpoints automatically based on long-term activity and health. An endpoint that receives no requests for 3 days has its max workers reduced to 2 (you'll get an email), and after 7 days its max workers is set to 0; see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). An endpoint that consistently produces unhealthy workers is also scaled down to stop billing and reduce thrashing, with an email notification; see [My endpoint was scaled down unexpectedly](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). In both cases, increase max workers manually to bring the endpoint back. +If an endpoint repeatedly produces unhealthy workers, Runpod automatically scales it down; see [My endpoint was scaled down unexpectedly](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly). +View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless). + ## Max worker limits Account balance determines your maximum workers (flex + active combined):