Support kokoro model#4192
Open
michalkulakowski wants to merge 1 commit intomainfrom
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds initial Kokoro TTS support to OVMS’ MediaPipe TTS node by extending request/options handling (language/speed/voice), adapting speaker-embedding loading to the pipeline’s expected shape, and updating build images/dependencies for Kokoro-related requirements.
Changes:
- Extend TTS calculator options and request parsing to support
language,speed, and updated voice handling. - Load speaker embeddings using the pipeline-reported embedding shape and introduce Kokoro-specific WAV output preparation.
- Update Docker/Make build flow to optionally install
espeak-ngand to build against a non-default OpenVINO GenAI source.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
tts_asr_roundtrip.py |
Adds a standalone TTS→ASR roundtrip script for endpoint validation. |
src/audio/text_to_speech/t2s_servable.hpp |
Refactors includes/forward decls; adds members needed for pipeline/voices concurrency. |
src/audio/text_to_speech/t2s_servable.cpp |
Loads speaker embeddings using the pipeline’s expected shape. |
src/audio/text_to_speech/t2s_calculator.proto |
Adds language and speed options to TTS node configuration. |
src/audio/text_to_speech/t2s_calculator.cc |
Parses voice/language/speed from JSON and forwards them into TTS generation; switches audio output writer. |
src/audio/text_to_speech/BUILD |
Adds GenAI dependency for the TTS calculator target. |
src/audio/speech_to_text/s2t_servable.cpp |
Minor whitespace cleanup. |
src/audio/audio_utils.hpp |
Declares Kokoro-specific audio output helper. |
src/audio/audio_utils.cpp |
Implements Kokoro WAV writer (24kHz/float) and updates headers. |
Makefile |
Adds ESPEAK build arg plumbing (default enabled). |
Dockerfile.ubuntu |
Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable. |
Dockerfile.redhat |
Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable. |
Comment on lines
+281
to
+287
| def tts_request(endpoint: str, model: str, voice: str, prompt: str, language: str) -> bytes: | ||
| url = endpoint.rstrip("/") + "/audio/speech" | ||
| payload = { | ||
| "model": model, | ||
| "voice": voice, | ||
| "input": prompt, | ||
| } |
Comment on lines
+296
to
+333
| def split_text_into_chunks(text: str, max_chars: int) -> list[str]: | ||
| if max_chars <= 0: | ||
| return [text] | ||
| text = text.strip() | ||
| if len(text) <= max_chars: | ||
| return [text] | ||
|
|
||
| sentences = [] | ||
| buf = [] | ||
| for ch in text: | ||
| buf.append(ch) | ||
| if ch in "。!?;\n": | ||
| sentence = "".join(buf).strip() | ||
| if sentence: | ||
| sentences.append(sentence) | ||
| buf = [] | ||
| if buf: | ||
| sentence = "".join(buf).strip() | ||
| if sentence: | ||
| sentences.append(sentence) | ||
|
|
||
| chunks = [] | ||
| current = "" | ||
| for s in sentences: | ||
| if not current: | ||
| current = s | ||
| continue | ||
| if len(current) + len(s) <= max_chars: | ||
| current += s | ||
| else: | ||
| chunks.append(current) | ||
| current = s | ||
| if current: | ||
| chunks.append(current) | ||
|
|
||
| if not chunks: | ||
| chunks = [text[i : i + max_chars] for i in range(0, len(text), max_chars)] | ||
| return chunks |
| if (streamIt != payload.parsedJson->MemberEnd()) { | ||
| return absl::InvalidArgumentError("streaming is not supported"); | ||
| } | ||
| SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "1"); |
Comment on lines
+157
to
+160
| ov::Tensor speakerEmbedding; | ||
| std::string selectedVoice = "af_alloy"; | ||
| if (voiceName.has_value()) { | ||
| generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString(), pipe->voices[voiceName.value()]); | ||
| } else { | ||
| generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString()); | ||
| selectedVoice = voiceName.value(); |
Comment on lines
+162
to
+164
| if (speakerIt != pipe->voices.end()) { | ||
| speakerEmbedding = speakerIt->second; | ||
| } |
Comment on lines
+228
to
234
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| WORKDIR /openvino_genai/ | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} " -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \ |
Comment on lines
241
to
247
| WORKDIR /openvino_genai/ | ||
| ARG ov_genai_branch=master | ||
| ARG ov_genai_org=openvinotoolkit | ||
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} ${LTO_CXX_FLAGS} " -DCMAKE_SHARED_LINKER_FLAGS="${LTO_LD_FLAGS}" -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \ |
| RUN_GPU_TESTS ?= | ||
| GPU ?= 0 | ||
| NPU ?= 0 | ||
| ESPEAK ?= 1 |
Comment on lines
+228
to
+233
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| WORKDIR /openvino_genai/ | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ |
Comment on lines
241
to
+246
| WORKDIR /openvino_genai/ | ||
| ARG ov_genai_branch=master | ||
| ARG ov_genai_org=openvinotoolkit | ||
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``