Skip to content

Support kokoro model#4192

Open
michalkulakowski wants to merge 1 commit intomainfrom
mkulakow/kokoro_model_support
Open

Support kokoro model#4192
michalkulakowski wants to merge 1 commit intomainfrom
mkulakow/kokoro_model_support

Conversation

@michalkulakowski
Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings May 8, 2026 11:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial Kokoro TTS support to OVMS’ MediaPipe TTS node by extending request/options handling (language/speed/voice), adapting speaker-embedding loading to the pipeline’s expected shape, and updating build images/dependencies for Kokoro-related requirements.

Changes:

  • Extend TTS calculator options and request parsing to support language, speed, and updated voice handling.
  • Load speaker embeddings using the pipeline-reported embedding shape and introduce Kokoro-specific WAV output preparation.
  • Update Docker/Make build flow to optionally install espeak-ng and to build against a non-default OpenVINO GenAI source.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
tts_asr_roundtrip.py Adds a standalone TTS→ASR roundtrip script for endpoint validation.
src/audio/text_to_speech/t2s_servable.hpp Refactors includes/forward decls; adds members needed for pipeline/voices concurrency.
src/audio/text_to_speech/t2s_servable.cpp Loads speaker embeddings using the pipeline’s expected shape.
src/audio/text_to_speech/t2s_calculator.proto Adds language and speed options to TTS node configuration.
src/audio/text_to_speech/t2s_calculator.cc Parses voice/language/speed from JSON and forwards them into TTS generation; switches audio output writer.
src/audio/text_to_speech/BUILD Adds GenAI dependency for the TTS calculator target.
src/audio/speech_to_text/s2t_servable.cpp Minor whitespace cleanup.
src/audio/audio_utils.hpp Declares Kokoro-specific audio output helper.
src/audio/audio_utils.cpp Implements Kokoro WAV writer (24kHz/float) and updates headers.
Makefile Adds ESPEAK build arg plumbing (default enabled).
Dockerfile.ubuntu Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable.
Dockerfile.redhat Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable.

Comment thread tts_asr_roundtrip.py
Comment on lines +281 to +287
def tts_request(endpoint: str, model: str, voice: str, prompt: str, language: str) -> bytes:
url = endpoint.rstrip("/") + "/audio/speech"
payload = {
"model": model,
"voice": voice,
"input": prompt,
}
Comment thread tts_asr_roundtrip.py
Comment on lines +296 to +333
def split_text_into_chunks(text: str, max_chars: int) -> list[str]:
if max_chars <= 0:
return [text]
text = text.strip()
if len(text) <= max_chars:
return [text]

sentences = []
buf = []
for ch in text:
buf.append(ch)
if ch in "。!?;\n":
sentence = "".join(buf).strip()
if sentence:
sentences.append(sentence)
buf = []
if buf:
sentence = "".join(buf).strip()
if sentence:
sentences.append(sentence)

chunks = []
current = ""
for s in sentences:
if not current:
current = s
continue
if len(current) + len(s) <= max_chars:
current += s
else:
chunks.append(current)
current = s
if current:
chunks.append(current)

if not chunks:
chunks = [text[i : i + max_chars] for i in range(0, len(text), max_chars)]
return chunks
if (streamIt != payload.parsedJson->MemberEnd()) {
return absl::InvalidArgumentError("streaming is not supported");
}
SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "1");
Comment on lines +157 to +160
ov::Tensor speakerEmbedding;
std::string selectedVoice = "af_alloy";
if (voiceName.has_value()) {
generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString(), pipe->voices[voiceName.value()]);
} else {
generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString());
selectedVoice = voiceName.value();
Comment on lines +162 to +164
if (speakerIt != pipe->voices.end()) {
speakerEmbedding = speakerIt->second;
}
Comment thread Dockerfile.ubuntu
Comment on lines +228 to 234
ARG ov_genai_branch=kokoro_tts
ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
WORKDIR /openvino_genai/
# hadolint ignore=DL3003
RUN if [ "$ov_use_binary" == "0" ]; then \
git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} " -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \
Comment thread Dockerfile.redhat
Comment on lines 241 to 247
WORKDIR /openvino_genai/
ARG ov_genai_branch=master
ARG ov_genai_org=openvinotoolkit
ARG ov_genai_branch=kokoro_tts
ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
# hadolint ignore=DL3003
RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \
git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} ${LTO_CXX_FLAGS} " -DCMAKE_SHARED_LINKER_FLAGS="${LTO_LD_FLAGS}" -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \
Comment thread Makefile
RUN_GPU_TESTS ?=
GPU ?= 0
NPU ?= 0
ESPEAK ?= 1
Comment thread Dockerfile.ubuntu
Comment on lines +228 to +233
ARG ov_genai_branch=kokoro_tts
ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
WORKDIR /openvino_genai/
# hadolint ignore=DL3003
RUN if [ "$ov_use_binary" == "0" ]; then \
git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
Comment thread Dockerfile.redhat
Comment on lines 241 to +246
WORKDIR /openvino_genai/
ARG ov_genai_branch=master
ARG ov_genai_org=openvinotoolkit
ARG ov_genai_branch=kokoro_tts
ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git
# hadolint ignore=DL3003
RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \
git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants