Hi! I found this issue related to the use of MTP with the VLLM backend in docker model runner! Since I'm here, I want to ask about the integration of VLLM metal, is it activate by default for mac with the vllm backend?
Another interesting thing is that I've installed the vllm backend and it looks like it is using llama.cpp backend, any issue with that? For that I followed this guide https://www.docker.com/blog/docker-model-runner-integrates-vllm/
Thanks!
Error log:
docker model run hf.co/unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL
Unable to find model 'hf.co/unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL' locally. Pulling from the server.
9c68785fa64f: Pull complete [==================================================>] 17.91GB/17.91GB
00f45cd696de: Pull complete [==================================================>] 931.1MB/931.1MB
b33563055168: Pull complete [==================================================>] 25.41kB/25.41kB
053533475129: Pull complete [==================================================>] 931.1MB/931.1MB
4085665ee36d: Pull complete [==================================================>] 17.91GB/17.91GB
Model pulled successfully
> background model preload failed: preload failed: status=500 body=unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly: llama.cpp failed: failed to load model
Verbose output:
llama_model_load: error loading model: missing tensor 'blk.64.ssm_conv1d.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/Users/edm/.docker/models/bundles/sha256/ae07dd2945afaaf7034f795ec286ec4cf79e6843e23b75b7c0696e31b6d40244/model/model.gguf'
srv load_model: failed to load model, '/Users/edm/.docker/models/bundles/sha256/ae07dd2945afaaf7034f795ec286ec4cf79e6843e23b75b7c0696e31b6d40244/model/model.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
= 248065 '<|file_sep|>'
print_info: EOG token = 248044 '<|endoftext|>'
print_info: EOG token = 248046 '<|im_end|>'
print_info: EOG token = 248063 '<|fim_pad|>'
print_info: EOG token = 248064 '<|repo_name|>'
print_info: EOG token = 248065 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
Hi! I found this issue related to the use of MTP with the VLLM backend in docker model runner! Since I'm here, I want to ask about the integration of VLLM metal, is it activate by default for mac with the vllm backend?
Another interesting thing is that I've installed the vllm backend and it looks like it is using llama.cpp backend, any issue with that? For that I followed this guide https://www.docker.com/blog/docker-model-runner-integrates-vllm/
Thanks!
Error log: