-
Notifications
You must be signed in to change notification settings - Fork 41
docs: Document idle/unhealthy endpoint auto scale-down and image name validation #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c901e7e
afaee1a
331fd5d
ff84fb4
9ccf79c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -51,6 +51,17 @@ Check the job status response for error details. Common causes: | |
| - **OOM (Out of Memory)**: Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU. | ||
| - **Timeout**: Job exceeded execution timeout. Increase timeout or optimize processing. | ||
|
|
||
| ## Endpoint scaling issues | ||
|
|
||
| ### My endpoint was scaled down unexpectedly | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the |
||
|
|
||
| If your endpoint's max workers dropped without any change on your end, Runpod scaled the endpoint down automatically. This happens in two situations: | ||
|
|
||
| - **Prolonged inactivity**: When an endpoint receives no requests for 3 days, its max workers is reduced to 2, and after 7 days its max workers is set to 0. Runpod emails you when the first reduction happens. For more details, see [idle endpoint scale-down](/serverless/endpoints/endpoint-configurations#idle-endpoint-scale-down). | ||
| - **Repeated unhealthy workers**: When an endpoint consistently produces unhealthy (crashing) workers, Runpod scales it down to stop billing and reduce thrashing, and sends you an email. | ||
|
|
||
| To bring the endpoint back, increase its max workers in the [Runpod console](https://www.console.runpod.io/serverless). If the scale-down was caused by unhealthy workers, fix the underlying problem first, or the endpoint may be scaled down again. Check the [logs](/serverless/development/logs) for crash errors, and verify your worker using [local testing](/serverless/development/local-testing). | ||
|
|
||
| ## Cold start issues | ||
|
|
||
| ### Slow cold starts | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,8 @@ Versioning best practices: | |
| * Document the specific image version or SHA in your deployment documentation. | ||
| * Keep images as small as possible for faster startup times. | ||
|
|
||
| Runpod validates your image name when you create an endpoint, and a reference that doesn't resolve to a published image is rejected at creation time. For example, `runpod/pytorch:latest` is rejected because `:latest` is not a published tag for Runpod's base images. When you deploy a Runpod base image, specify a tag that actually exists. You can browse the available tags for each image on [Docker Hub](https://hub.docker.com/u/runpod). For your own worker images, use a specific version or SHA tag as described above rather than `:latest`. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documented image-name validation per DOCS-452 (SLS-8): Layer 2 image verification flipped from observe to enforce, so image names are now validated at endpoint creation and |
||
|
|
||
| ## Deploy an endpoint | ||
|
|
||
| <Tip> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the
Idle endpoint scale-downsubsection from DOCS-452 (SLS-7): endpoints idle for 3 days havemax_workersauto-reduced to 2 with an email sent, and idle for 7 days havemax_workersset to 0; this is system-driven and users must raise max workers manually to resume.Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down