Description
Atelet maintains a node-local image cache, but the scheduler does not know which images are cached on each node. As a result, an actor may be assigned to a cold node and pull its images again, even when another eligible node already has them cached.
Image preparation happens synchronously during ResumeActor, so unnecessary pulls increase actor startup latency and registry traffic.
Proposal
- Let atelet report its cached image digests to the control plane.
- Store cache information by node.
- After applying existing constraints, prefer available workers whose node has all required actor images cached.
- Fall back to the current random selection when no cache hit is available.
Local snapshot availability remains a hard constraint. Image cache affinity is applied only among nodes that can restore the selected snapshot.
A general scoring framework is not required: use cache-hit preference with fallback.
Expected Outcome
Reduce unnecessary image pulls and improve Actor Resume latency and consistency.
Related Issues
This proposal complements those efforts by making worker selection aware of node-local image cache availability.
Description
Atelet maintains a node-local image cache, but the scheduler does not know which images are cached on each node. As a result, an actor may be assigned to a cold node and pull its images again, even when another eligible node already has them cached.
Image preparation happens synchronously during
ResumeActor, so unnecessary pulls increase actor startup latency and registry traffic.Proposal
Local snapshot availability remains a hard constraint. Image cache affinity is applied only among nodes that can restore the selected snapshot.
A general scoring framework is not required: use cache-hit preference with fallback.
Expected Outcome
Reduce unnecessary image pulls and improve Actor Resume latency and consistency.
Related Issues
ResumeActordeadline.This proposal complements those efforts by making worker selection aware of node-local image cache availability.