Backends
JarvisLabs
This release adds JarvisLabs as a new backend, allowing dstack to provision GPU and CPU VMs on JarvisLabs, including spot GPU instances.
To configure the backend, log into your JarvisLabs account, create an API key, and add it to ~/.dstack/server/config.yml:
projects:
- name: main
backends:
- type: jarvislabs
creds:
type: api_key
api_key: ...Kubernetes
Multiple clusters
A single kubernetes backend can now manage multiple Kubernetes clusters. Each cluster is selected via a kubeconfig context and becomes its own dstack region:
projects:
- name: main
backends:
- type: kubernetes
kubeconfig:
filename: ~/.kube/config
contexts:
- name: gpu-cluster-a
- name: gpu-cluster-bEach context can configure its own proxy_jump.hostname and proxy_jump.port, and the namespace is taken from each kubeconfig context. When creating a dstack volume or gateway, the region field selects which cluster the resource is provisioned in.
The previous single-cluster configuration (without contexts) continues to work but is no longer recommended and may be removed in the future. Refer to the backends docs for the up-to-date configuration and migration guidance.
Object labeling
All dstack-managed Kubernetes resources (jump pods, job pods, gateways, volumes, registry-auth secrets, services) now share a consistent set of labels, making it easier to filter and audit dstack resources with kubectl:
app.kubernetes.io/name=dstack-{ssh-proxy,job,gateway,volume}app.kubernetes.io/instanceapp.kubernetes.io/managed-by=dstackk8s.dstack.ai/projectk8s.dstack.ai/name(if applicable)k8s.dstack.ai/user(if applicable)
Bug fixes
- Jobs no longer retry indefinitely when the target fleet is at capacity.
- Negative
retry.durationvalues (e.g.-1) are now rejected during configuration parsing instead of silently producing a nonsensical retry spec.
What's changed
- Fix Kubernetes backend
utils.pytyping by @un-def in #3889 - [CI] Bump pyright-action by @un-def in #3888
- Reject negative retry durations by @pragnyanramtha in #3885
- Fix infinite job retry when fleet is at capacity by @jvstme in #3887
- Kubernetes: multiple clusters support by @un-def in #3884
- Add JarvisLabs backend by @peterschmidt85 in #3875
- Kubernetes: standardize object labeling by @un-def in #3891
- [Docs] Fix
gen_schema_reference.pyon Python 3.10 by @un-def in #3883
New contributors
- @pragnyanramtha made their first contribution in #3885
Full changelog: 0.20.20...0.20.21