Gemma4 parser#4179
Open
przepeck wants to merge 38 commits into
Open
Conversation
ab0aa5b to
1c329a4
Compare
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds Gemma4 tool-call parsing support to OVMS LLM I/O processing and adjusts the VLM legacy unary path to preserve/handle special tokens by reusing the streaming text collection logic (per CVS-184756). It also updates model-prep scripts and introduces a dedicated Gemma4 output parser test suite.
Changes:
- Added a new
Gemma4ToolParserand wired it intoOutputParserselection. - Modified VLM legacy unary response preparation to reconstruct the full text from streamer callbacks (to retain special tokens) and updated OpenAI API handler interfaces accordingly.
- Added Gemma4 tokenizer/model preparation steps and extensive unit tests for Gemma4 tool-call parsing (unary + streaming scenarios).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| windows_prepare_llm_models.bat | Adds Gemma4 tokenizer download step for Windows test model prep. |
| prepare_llm_models.sh | Adds Gemma4 tokenizer conversion/download for Linux test model prep. |
| src/test/llm/output_parsers/gemma4_output_parser_test.cpp | New unit tests covering Gemma4 tool-call parsing and streaming chunk behavior. |
| src/test/http_openai_handler_test.cpp | Updates tests to match the new VLM unary serialization API signature. |
| src/llm/visual_language_model/legacy/servable.cpp | Implements unary “streaming-style” text accumulation to preserve special tokens; passes full text into unary serialization. |
| src/llm/io_processing/utils.hpp | Adds new utility declarations used by Gemma4 parsing/serialization. |
| src/llm/io_processing/utils.cpp | Implements JSON argument writing and delimiter search respecting nested structures/quotes. |
| src/llm/io_processing/output_parser.cpp | Registers gemma4 tool parser type. |
| src/llm/io_processing/gemma4/tool_parser.hpp | New Gemma4 tool parser interface + streaming state machine declarations. |
| src/llm/io_processing/gemma4/tool_parser.cpp | New Gemma4 tool parser implementation (unary + streaming). |
| src/llm/BUILD | Adds Bazel targets/deps for Gemma4 parser and RapidJSON usage in utils. |
| src/llm/apis/openai_api_handler.hpp | Updates VLM unary serialization interface to accept explicit textResponse. |
| src/llm/apis/openai_completions.hpp | Updates VLM unary serialization signature. |
| src/llm/apis/openai_completions.cpp | Uses provided textResponse for VLM unary serialization. |
| src/llm/apis/openai_responses.hpp | Updates VLM unary serialization signature. |
| src/llm/apis/openai_responses.cpp | Uses provided textResponse for VLM unary serialization. |
| spelling-whitelist.txt | Adds the new Gemma4 test file to the whitelist list. |
Comments suppressed due to low confidence (2)
src/llm/apis/openai_completions.cpp:497
- serializeUnaryResponse(VLMDecodedResults&, textResponse) only emits a choice when
textResponseis non-empty. If the model legitimately generates an empty string, this returns an emptychoicesarray, which is not a valid OpenAI-style response (a single choice with empty content/tool_calls should still be returned). Consider always creating one choice and usingtextResponseas-is (even when empty).
if (!textResponse.empty()) {
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated text: {}", textResponse);
// Workaround to use OVMS unary parsers: get tokens from string
// This way we have detokenized text from GenAI and calculate tokens, to further convert back to text again, in parseOutputIfNeeded...
auto generatedTokens = encodeTextToTokens(textResponse);
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated tokens: {}", generatedTokens);
ParsedOutput parsedOutput = parseOutputIfNeeded(generatedTokens);
jsonResponse.StartObject();
// finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
jsonResponse.FinishReason(finishReason.value_or("unknown"));
// index: integer; Choice index, only n=1 supported anyway
jsonResponse.Index(index++);
// TODO: logprobs: object/null; Log probability information for the choice.
if (endpoint == Endpoint::CHAT_COMPLETIONS) {
jsonResponse.MessageObject(parsedOutput);
} else if (endpoint == Endpoint::COMPLETIONS) {
jsonResponse.Text(parsedOutput);
}
// finish message object
jsonResponse.EndObject();
}
// finish choices array
jsonResponse.EndArray();
src/llm/apis/openai_responses.cpp:676
- serializeUnaryResponse(VLMDecodedResults&, textResponse) skips adding any ParsedOutput when
textResponseis empty, so serializeUnaryResponseImpl() is called with an empty vector. If the model output is legitimately empty, this likely produces a response without any output items/content, which is invalid for the Responses API. Consider emitting a single empty ParsedOutput (or otherwise ensuring one output item exists) even whentextResponseis empty.
std::vector<ParsedOutput> parsedOutputs;
if (!textResponse.empty()) {
if (outputParser != nullptr) {
// Same workaround as in chat completions
auto generatedTokens = encodeTextToTokens(textResponse);
parsedOutputs.push_back(parseOutputIfNeeded(generatedTokens));
} else {
// Fast path: no output parser, use decoded text directly.
ParsedOutput output;
output.content = textResponse;
parsedOutputs.push_back(std::move(output));
}
}
return serializeUnaryResponseImpl(parsedOutputs);
Comment on lines
+39
to
+67
| std::string Gemma4ToolParser::parseArrayParameter(std::string argumentStr) { | ||
| size_t pos = 1; | ||
| std::string parsedArguments = "["; | ||
|
|
||
| while (pos != std::string::npos) { | ||
| size_t stringStartPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, pos); | ||
| if (stringStartPos == std::string::npos) { | ||
| break; | ||
| } | ||
| stringStartPos += TOOL_ARGS_STRING_INDICATOR.size(); | ||
| size_t stringEndPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, stringStartPos); | ||
| if (stringEndPos == std::string::npos) { | ||
| break; | ||
| } | ||
|
|
||
| std::string originalStr = argumentStr.substr(stringStartPos, stringEndPos - stringStartPos); | ||
| size_t quotePos = 0; | ||
| while ((quotePos = originalStr.find('\"', quotePos)) != std::string::npos) { | ||
| originalStr.insert(quotePos, "\\"); | ||
| quotePos += 2; | ||
| } | ||
| parsedArguments += "\"" + originalStr + "\","; | ||
|
|
||
| pos = stringEndPos + TOOL_ARGS_STRING_INDICATOR.size() + 1; | ||
| } | ||
|
|
||
| parsedArguments.back() = ']'; | ||
|
|
||
| return parsedArguments; |
Comment on lines
+38
to
+62
| void writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) { | ||
| if (arg.IsString()) { | ||
| writer.String(arg.GetString()); | ||
| } else if (arg.IsInt64()) { | ||
| writer.Int64(arg.GetInt64()); | ||
| } else if (arg.IsDouble()) { | ||
| writer.Double(arg.GetDouble()); | ||
| } else if (arg.IsBool()) { | ||
| writer.Bool(arg.GetBool()); | ||
| } else if (arg.IsArray()) { | ||
| writer.StartArray(); | ||
| for (auto& elem : arg.GetArray()) { | ||
| writeArgumentOfAnyType(elem, writer); | ||
| } | ||
| writer.EndArray(); | ||
| } else if (arg.IsObject()) { | ||
| writer.StartObject(); | ||
| for (auto it = arg.MemberBegin(); it != arg.MemberEnd(); ++it) { | ||
| writer.Key(it->name.GetString()); | ||
| writeArgumentOfAnyType(it->value, writer); | ||
| } | ||
| writer.EndObject(); | ||
| } else { | ||
| writer.String(""); | ||
| } |
Comment on lines
+356
to
+386
| std::optional<rapidjson::Document> Gemma4ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) { | ||
| if (chunk.empty()) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| this->streamingContent += chunk; | ||
|
|
||
| if (parseNewContent()) { | ||
| if (this->currentState == State::ToolCallParameters) { | ||
| return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::ToolCallEnded) { | ||
| return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::Content) { | ||
| size_t contentEnd = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition); | ||
| std::string content; | ||
| if (contentEnd != std::string::npos) { | ||
| content = this->streamingContent.substr(this->streamingPosition, contentEnd - this->streamingPosition); | ||
| } else { | ||
| content = this->streamingContent.substr(this->streamingPosition); | ||
| } | ||
| this->streamingPosition += content.size(); | ||
| if (!content.empty()) { | ||
| return wrapDeltaContent(content); | ||
| } | ||
| } | ||
| if (this->currentState == State::AfterToolCall) { | ||
| this->currentState = State::Content; | ||
| } | ||
| } |
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dkalinowski
reviewed
May 6, 2026
dtrawins
reviewed
May 6, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
dkalinowski
reviewed
May 8, 2026
| int singleQuoteDepth = 0; | ||
|
|
||
| for (size_t i = startPos; i < str.length(); ++i) { | ||
| if (bracketDepth == 0 && braceDepth == 0 && quoteDepth == 0 && singleQuoteDepth == 0 && |
Collaborator
There was a problem hiding this comment.
is it also used by other parsers? we should also run bfcl for other parsers that use it so we are sure there is no regression (i think lfm uses it?)
dtrawins
approved these changes
May 8, 2026
dkalinowski
reviewed
May 11, 2026
dkalinowski
reviewed
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🛠 Summary
CVS-184756
Enabling gemma4 model, changing vlm pipeline to accept special tokens in streaming
🧪 Checklist
``