[feat] added integration with OpenVINO Model Server#940
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates the OpenVINO Model Server (OVMS) as a backend, allowing users to run OpenVINO IR models. It introduces logic to handle packaging and downloading of models without standard weight files (such as GGUF or SafeTensors) when an OpenVINO repository is detected. Feedback highlights two critical issues: first, downloading all files in an OpenVINO repository can pull in unnecessary large weights (like .safetensors or .gguf), which should be filtered out; second, using os.Stat to check for the OVMS binary will fail if the binary is configured via system PATH, and should be replaced with exec.LookPath.
| if isOpenVINORepo { | ||
| for _, f := range files { | ||
| if f.Type == "file" { | ||
| allFiles = append(allFiles, f) | ||
| } | ||
| } |
There was a problem hiding this comment.
Downloading all files in an OpenVINO repository can result in downloading unnecessary large weight files (such as .safetensors, .gguf, or non-OpenVINO .bin files like pytorch_model.bin). This can cause significant performance degradation, slow model creation, and potential disk space exhaustion. We should filter the files to only download the required OpenVINO IR files (.xml and matching .bin pairs) and configuration files.
if isOpenVINORepo {
xmlStems := make(map[string]bool)
for _, f := range files {
if f.Type == "file" && strings.HasSuffix(strings.ToLower(f.Path), ".xml") {
xmlStems[f.Path[:len(f.Path)-4]] = true
}
}
for _, f := range files {
if f.Type == "file" {
lowerPath := strings.ToLower(f.Path)
if strings.HasSuffix(lowerPath, ".safetensors") || strings.HasSuffix(lowerPath, ".gguf") || strings.HasSuffix(lowerPath, ".dduf") {
continue
}
if strings.HasSuffix(lowerPath, ".bin") {
stem := f.Path[:len(f.Path)-4]
if !xmlStems[stem] {
continue
}
}
allFiles = append(allFiles, f)
}
}
}References
- User empathy — How does this affect the people who use, operate, and maintain this system? Consider developer ergonomics, operational burden, error messages, failure modes, and the debugging experience. (link)
There was a problem hiding this comment.
@dtrawins does this make sense? You know more about this repo format/file format than me
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
|
||
| # Get information about a specific model | ||
| curl http://localhost:8080/models/ai/smollm2 | ||
| curl http://localhost:13434/models/hf.co/OpenVINO/Qwen3-0.6B-int4-ov |
There was a problem hiding this comment.
We can add openvino examples, but we should avoid replacing the existing ones
No description provided.