Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions serverless/endpoints/endpoint-configurations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ Number of GPUs assigned to each worker instance. Default is 1. Generally priorit

How long a worker stays active after completing a request before shutting down. You're billed during idle time, but the worker remains warm for immediate processing. Default: 5 seconds.

### Idle endpoint scale-down

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Idle endpoint scale-down subsection from DOCS-452 (SLS-7): endpoints idle for 3 days have max_workers auto-reduced to 2 with an email sent, and idle for 7 days have max_workers set to 0; this is system-driven and users must raise max workers manually to resume.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down


Runpod automatically scales down endpoints that go a long time without any requests, so unused endpoints don't keep consuming your account balance.
* After 3 days with no requests, the endpoint's max workers is reduced to 2 and Runpod sends you an email notification.
* After 7 days with no requests, max workers is set to 0.

This scale-down is automatic and system-driven, and the timer is based on request activity, so any incoming request resets it.

Once an endpoint has been scaled down this way, it stays at its reduced max workers until you raise the value yourself. To use the endpoint again, increase its max workers in the Runpod console. To prevent an endpoint from scaling down in the first place, make sure it continues to receive requests.

### Execution timeout

Maximum duration for a single job. When exceeded, the job fails and the worker stops. Keep enabled to prevent runaway jobs. Default: 600s (10 min). Range: 5s to 7 days.
Expand Down
11 changes: 11 additions & 0 deletions serverless/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,17 @@ Check the job status response for error details. Common causes:
- **OOM (Out of Memory)**: Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
- **Timeout**: Job exceeded execution timeout. Increase timeout or optimize processing.

## Endpoint scaling issues

### My endpoint was scaled down unexpectedly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the My endpoint was scaled down unexpectedly troubleshooting entry covering both DOCS-452 scale-down causes (SLS-7 prolonged inactivity and SLS-121 repeated unhealthy workers), each of which triggers an email, and how to resume the endpoint.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down


If your endpoint's max workers dropped without any change on your end, Runpod scaled the endpoint down automatically. This happens in two situations:

- **Prolonged inactivity**: When an endpoint receives no requests for 3 days, its max workers is reduced to 2, and after 7 days its max workers is set to 0. Runpod emails you when the first reduction happens. For more details, see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down).
- **Repeated unhealthy workers**: When an endpoint consistently produces unhealthy (crashing) workers, Runpod scales it down to stop billing and reduce thrashing, and sends you an email.

To bring the endpoint back, increase its max workers in the [Runpod console](https://www.console.runpod.io/serverless). If the scale-down was caused by unhealthy workers, fix the underlying problem first, or the endpoint may be scaled down again. Check the [logs](/serverless/development/logs) for crash errors, and verify your worker using [local testing](/serverless/development/local-testing).

## Cold start issues

### Slow cold starts
Expand Down
2 changes: 2 additions & 0 deletions serverless/workers/deploy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ Versioning best practices:
* Document the specific image version or SHA in your deployment documentation.
* Keep images as small as possible for faster startup times.

Runpod validates your image name when you create an endpoint, and a reference that doesn't resolve to a published image is rejected at creation time. For example, `runpod/pytorch:latest` is rejected because `:latest` is not a published tag for Runpod's base images. When you deploy a Runpod base image, specify a tag that actually exists. You can browse the available tags for each image on [Docker Hub](https://hub.docker.com/u/runpod). For your own worker images, use a specific version or SHA tag as described above rather than `:latest`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented image-name validation per DOCS-452 (SLS-8): Layer 2 image verification flipped from observe to enforce, so image names are now validated at endpoint creation and runpod/pytorch:latest (and other runpod/*:latest references) are rejected because :latest is not a published tag for these base images.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down


## Deploy an endpoint

<Tip>
Expand Down
4 changes: 4 additions & 0 deletions serverless/workers/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ The system may also spin up **extra workers** during traffic spikes when Docker
| **Outdated** | Marked for replacement after update | Yes (while processing) |
| **Unhealthy** | Crashed; auto-retries for up to 7 days | No |

<Note>
If an endpoint repeatedly produces unhealthy workers, Runpod automatically scales it down; see [My endpoint was scaled down unexpectedly](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly).
</Note>

View worker states in the **Workers** tab of your endpoint in the [Runpod console](https://www.console.runpod.io/serverless).

## Max worker limits
Expand Down
Loading