Skip to content

[feat] added integration with OpenVINO Model Server#940

Draft
dtrawins wants to merge 5 commits into
docker:mainfrom
dtrawins:ovms
Draft

[feat] added integration with OpenVINO Model Server#940
dtrawins wants to merge 5 commits into
docker:mainfrom
dtrawins:ovms

Conversation

@dtrawins
Copy link
Copy Markdown

No description provided.

@dtrawins dtrawins changed the title Ovms [feat] added integration with OpenVINO Model Server May 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the OpenVINO Model Server (OVMS) as a backend, allowing users to run OpenVINO IR models. It introduces logic to handle packaging and downloading of models without standard weight files (such as GGUF or SafeTensors) when an OpenVINO repository is detected. Feedback highlights two critical issues: first, downloading all files in an OpenVINO repository can pull in unnecessary large weights (like .safetensors or .gguf), which should be filtered out; second, using os.Stat to check for the OVMS binary will fail if the binary is configured via system PATH, and should be replaced with exec.LookPath.

Comment on lines +61 to +66
if isOpenVINORepo {
for _, f := range files {
if f.Type == "file" {
allFiles = append(allFiles, f)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Downloading all files in an OpenVINO repository can result in downloading unnecessary large weight files (such as .safetensors, .gguf, or non-OpenVINO .bin files like pytorch_model.bin). This can cause significant performance degradation, slow model creation, and potential disk space exhaustion. We should filter the files to only download the required OpenVINO IR files (.xml and matching .bin pairs) and configuration files.

if isOpenVINORepo {
		xmlStems := make(map[string]bool)
		for _, f := range files {
			if f.Type == "file" && strings.HasSuffix(strings.ToLower(f.Path), ".xml") {
				xmlStems[f.Path[:len(f.Path)-4]] = true
			}
		}
		for _, f := range files {
			if f.Type == "file" {
				lowerPath := strings.ToLower(f.Path)
				if strings.HasSuffix(lowerPath, ".safetensors") || strings.HasSuffix(lowerPath, ".gguf") || strings.HasSuffix(lowerPath, ".dduf") {
					continue
				}
				if strings.HasSuffix(lowerPath, ".bin") {
					stem := f.Path[:len(f.Path)-4]
					if !xmlStems[stem] {
						continue
					}
				}
				allFiles = append(allFiles, f)
			}
		}
	}
References
  1. User empathy — How does this affect the people who use, operate, and maintain this system? Consider developer ergonomics, operational burden, error messages, failure modes, and the debugging experience. (link)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtrawins does this make sense? You know more about this repo format/file format than me

Comment thread pkg/inference/backends/ovms/ovms.go Outdated
dtrawins and others added 2 commits May 26, 2026 01:28
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Comment thread README.md

# Get information about a specific model
curl http://localhost:8080/models/ai/smollm2
curl http://localhost:13434/models/hf.co/OpenVINO/Qwen3-0.6B-int4-ov
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add openvino examples, but we should avoid replacing the existing ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants