Support kokoro model by michalkulakowski · Pull Request #4192 · openvinotoolkit/model_server

michalkulakowski · 2026-05-08T11:50:13Z

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

This PR adds initial Kokoro TTS support to OVMS’ MediaPipe TTS node by extending request/options handling (language/speed/voice), adapting speaker-embedding loading to the pipeline’s expected shape, and updating build images/dependencies for Kokoro-related requirements.

Changes:

Extend TTS calculator options and request parsing to support language, speed, and updated voice handling.
Load speaker embeddings using the pipeline-reported embedding shape and introduce Kokoro-specific WAV output preparation.
Update Docker/Make build flow to optionally install espeak-ng and to build against a non-default OpenVINO GenAI source.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`tts_asr_roundtrip.py`	Adds a standalone TTS→ASR roundtrip script for endpoint validation.
`src/audio/text_to_speech/t2s_servable.hpp`	Refactors includes/forward decls; adds members needed for pipeline/voices concurrency.
`src/audio/text_to_speech/t2s_servable.cpp`	Loads speaker embeddings using the pipeline’s expected shape.
`src/audio/text_to_speech/t2s_calculator.proto`	Adds `language` and `speed` options to TTS node configuration.
`src/audio/text_to_speech/t2s_calculator.cc`	Parses `voice/language/speed` from JSON and forwards them into TTS generation; switches audio output writer.
`src/audio/text_to_speech/BUILD`	Adds GenAI dependency for the TTS calculator target.
`src/audio/speech_to_text/s2t_servable.cpp`	Minor whitespace cleanup.
`src/audio/audio_utils.hpp`	Declares Kokoro-specific audio output helper.
`src/audio/audio_utils.cpp`	Implements Kokoro WAV writer (24kHz/float) and updates headers.
`Makefile`	Adds `ESPEAK` build arg plumbing (default enabled).
`Dockerfile.ubuntu`	Installs `espeak-ng` optionally; switches GenAI clone to a fork/branch variable.
`Dockerfile.redhat`	Installs `espeak-ng` optionally; switches GenAI clone to a fork/branch variable.

+def tts_request(endpoint: str, model: str, voice: str, prompt: str, language: str) -> bytes:
+    url = endpoint.rstrip("/") + "/audio/speech"
+    payload = {
+        "model": model,
+        "voice": voice,
+        "input": prompt,
+    }


+def split_text_into_chunks(text: str, max_chars: int) -> list[str]:
+    if max_chars <= 0:
+        return [text]
+    text = text.strip()
+    if len(text) <= max_chars:
+        return [text]
+
+    sentences = []
+    buf = []
+    for ch in text:
+        buf.append(ch)
+        if ch in "。！？；\n":
+            sentence = "".join(buf).strip()
+            if sentence:
+                sentences.append(sentence)
+            buf = []
+    if buf:
+        sentence = "".join(buf).strip()
+        if sentence:
+            sentences.append(sentence)
+
+    chunks = []
+    current = ""
+    for s in sentences:
+        if not current:
+            current = s
+            continue
+        if len(current) + len(s) <= max_chars:
+            current += s
+        else:
+            chunks.append(current)
+            current = s
+    if current:
+        chunks.append(current)
+
+    if not chunks:
+        chunks = [text[i : i + max_chars] for i in range(0, len(text), max_chars)]
+    return chunks


                if (streamIt != payload.parsedJson->MemberEnd()) {
                    return absl::InvalidArgumentError("streaming is not supported");
                }
+                SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "1");


+                ov::Tensor speakerEmbedding;
+                std::string selectedVoice = "af_alloy";
                if (voiceName.has_value()) {
-                    generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString(), pipe->voices[voiceName.value()]);
-                } else {
-                    generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString());
+                    selectedVoice = voiceName.value();


+                    if (speakerIt != pipe->voices.end()) {
+                        speakerEmbedding = speakerIt->second;
+                    }


+ARG ov_genai_branch=kokoro_tts
+ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
 WORKDIR /openvino_genai/
 # hadolint ignore=DL3003
 RUN if [ "$ov_use_binary" == "0" ]; then \
-    git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
+    git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
    cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} " -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \


 WORKDIR /openvino_genai/
-ARG ov_genai_branch=master
-ARG ov_genai_org=openvinotoolkit
+ARG ov_genai_branch=kokoro_tts
+ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
 # hadolint ignore=DL3003
 RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \
-    git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
+    git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
    cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} ${LTO_CXX_FLAGS} " -DCMAKE_SHARED_LINKER_FLAGS="${LTO_LD_FLAGS}" -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \


 RUN_GPU_TESTS ?=
 GPU ?= 0
 NPU ?= 0
+ESPEAK ?= 1


+ARG ov_genai_branch=kokoro_tts
+ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
 WORKDIR /openvino_genai/
 # hadolint ignore=DL3003
 RUN if [ "$ov_use_binary" == "0" ]; then \
-    git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
+    git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \


 WORKDIR /openvino_genai/
-ARG ov_genai_branch=master
-ARG ov_genai_org=openvinotoolkit
+ARG ov_genai_branch=kokoro_tts
+ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
 # hadolint ignore=DL3003
 RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \
-    git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
+    git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \


Support kokoro model

9ff49e4

Copilot AI review requested due to automatic review settings May 8, 2026 11:50

Copilot started reviewing on behalf of michalkulakowski May 8, 2026 11:51 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support kokoro model#4192

Support kokoro model#4192
michalkulakowski wants to merge 1 commit intomainfrom
mkulakow/kokoro_model_support

michalkulakowski commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michalkulakowski commented May 8, 2026

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants