Skip to content

[Bug]: dstack-shim doesn't respect proxy-related environment variables #3906

@madprogrammer

Description

@madprogrammer

Steps to reproduce

Problem

When dstack-shim starts a job container, the entrypoint runs an install_pkg openssh-server step (plus, in the non-shim flow, install_pkg curl and a curl download of the runner binary). On hosts that
only have outbound internet access through an HTTP proxy, this step fails with a network/DNS error even though the shim itself was started with http_proxy/https_proxy exported and Docker pulls succeed (the
Docker daemon reads its own proxy config).

The root cause is that no proxy env vars are forwarded into the container, so apt-get/yum/apk inside the entrypoint shell never see them.

Repro

  1. Run dstack-shim on a host that requires http_proxy/https_proxy for outbound traffic.
  2. Submit any job that uses an image without sshd preinstalled (e.g., a plain ubuntu:22.04).
  3. Container entrypoint exits non-zero in apt-get update / apt-get install -y openssh-server.

Where the gap is

The entrypoint chain is built in getSSHShellCommands() and runs install_pkg openssh-server:

https://github.com/dstackai/dstack/blob/master/runner/internal/shim/docker.go#L984-L1000

The container is created in createContainer with Env populated only from PJRT_DEVICE (plus GPU/HABANA vars later); nothing forwards proxy vars from the shim's own environment:

https://github.com/dstackai/dstack/blob/master/runner/internal/shim/docker.go#L862-L901

TaskConfig has no env field at all, so the server can't pass them through either:

https://github.com/dstackai/dstack/blob/master/runner/internal/shim/models.go#L83-L105

The same install snippet exists on the Python side (non-shim flow) and has the same gap:

https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/base/compute.py#L982-L1027

Docker does not auto-propagate the daemon's environment into containers, so even if proxy vars are set on the host (e.g., via get_shim_env or cloud-init), they don't reach the entrypoint.

Suggested fix

Minimal change in createContainer (runner/internal/shim/docker.go): forward proxy vars from the shim's own environment into the container Env:

for _, name := range []string{
    "http_proxy", "https_proxy", "no_proxy",
    "HTTP_PROXY", "HTTPS_PROXY", "NO_PROXY",
} {
    if v, ok := os.LookupEnv(name); ok {
        envVars = append(envVars, name+"="+v)
    }
}

A more flexible follow-up would be to add an Env field to TaskConfig so the server can pass per-task env vars to the shim.

Environment

  • dstack version: 0.20.21
  • Backend: ssh
  • Host OS: Ubuntu 26.04 LTS
  • Container image: nvidia/cuda:12.8.0-devel-ubuntu24.04

Actual behaviour

No response

Expected behaviour

No response

dstack version

0.20.21

Server logs

Exited (none)
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package openssh-server

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions