Skip to content

Gemma4 parser#4179

Open
przepeck wants to merge 38 commits into
mainfrom
przepeck/gemma_parser
Open

Gemma4 parser#4179
przepeck wants to merge 38 commits into
mainfrom
przepeck/gemma_parser

Conversation

@przepeck
Copy link
Copy Markdown
Collaborator

@przepeck przepeck commented May 5, 2026

🛠 Summary

CVS-184756
Enabling gemma4 model, changing vlm pipeline to accept special tokens in streaming

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@przepeck przepeck force-pushed the przepeck/gemma_parser branch from ab0aa5b to 1c329a4 Compare May 6, 2026 09:53
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment thread src/llm/io_processing/utils.hpp
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp Outdated
@przepeck przepeck marked this pull request as ready for review May 6, 2026 12:07
Copilot AI review requested due to automatic review settings May 6, 2026 12:07
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp Outdated
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Gemma4 tool-call parsing support to OVMS LLM I/O processing and adjusts the VLM legacy unary path to preserve/handle special tokens by reusing the streaming text collection logic (per CVS-184756). It also updates model-prep scripts and introduces a dedicated Gemma4 output parser test suite.

Changes:

  • Added a new Gemma4ToolParser and wired it into OutputParser selection.
  • Modified VLM legacy unary response preparation to reconstruct the full text from streamer callbacks (to retain special tokens) and updated OpenAI API handler interfaces accordingly.
  • Added Gemma4 tokenizer/model preparation steps and extensive unit tests for Gemma4 tool-call parsing (unary + streaming scenarios).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
windows_prepare_llm_models.bat Adds Gemma4 tokenizer download step for Windows test model prep.
prepare_llm_models.sh Adds Gemma4 tokenizer conversion/download for Linux test model prep.
src/test/llm/output_parsers/gemma4_output_parser_test.cpp New unit tests covering Gemma4 tool-call parsing and streaming chunk behavior.
src/test/http_openai_handler_test.cpp Updates tests to match the new VLM unary serialization API signature.
src/llm/visual_language_model/legacy/servable.cpp Implements unary “streaming-style” text accumulation to preserve special tokens; passes full text into unary serialization.
src/llm/io_processing/utils.hpp Adds new utility declarations used by Gemma4 parsing/serialization.
src/llm/io_processing/utils.cpp Implements JSON argument writing and delimiter search respecting nested structures/quotes.
src/llm/io_processing/output_parser.cpp Registers gemma4 tool parser type.
src/llm/io_processing/gemma4/tool_parser.hpp New Gemma4 tool parser interface + streaming state machine declarations.
src/llm/io_processing/gemma4/tool_parser.cpp New Gemma4 tool parser implementation (unary + streaming).
src/llm/BUILD Adds Bazel targets/deps for Gemma4 parser and RapidJSON usage in utils.
src/llm/apis/openai_api_handler.hpp Updates VLM unary serialization interface to accept explicit textResponse.
src/llm/apis/openai_completions.hpp Updates VLM unary serialization signature.
src/llm/apis/openai_completions.cpp Uses provided textResponse for VLM unary serialization.
src/llm/apis/openai_responses.hpp Updates VLM unary serialization signature.
src/llm/apis/openai_responses.cpp Uses provided textResponse for VLM unary serialization.
spelling-whitelist.txt Adds the new Gemma4 test file to the whitelist list.
Comments suppressed due to low confidence (2)

src/llm/apis/openai_completions.cpp:497

  • serializeUnaryResponse(VLMDecodedResults&, textResponse) only emits a choice when textResponse is non-empty. If the model legitimately generates an empty string, this returns an empty choices array, which is not a valid OpenAI-style response (a single choice with empty content/tool_calls should still be returned). Consider always creating one choice and using textResponse as-is (even when empty).
    if (!textResponse.empty()) {
        SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated text: {}", textResponse);

        // Workaround to use OVMS unary parsers: get tokens from string
        // This way we have detokenized text from GenAI and calculate tokens, to further convert back to text again, in parseOutputIfNeeded...
        auto generatedTokens = encodeTextToTokens(textResponse);

        SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated tokens: {}", generatedTokens);
        ParsedOutput parsedOutput = parseOutputIfNeeded(generatedTokens);
        jsonResponse.StartObject();
        // finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
        auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
        jsonResponse.FinishReason(finishReason.value_or("unknown"));
        // index: integer; Choice index, only n=1 supported anyway
        jsonResponse.Index(index++);
        // TODO: logprobs: object/null; Log probability information for the choice.

        if (endpoint == Endpoint::CHAT_COMPLETIONS) {
            jsonResponse.MessageObject(parsedOutput);
        } else if (endpoint == Endpoint::COMPLETIONS) {
            jsonResponse.Text(parsedOutput);
        }

        // finish message object
        jsonResponse.EndObject();
    }
    // finish choices array
    jsonResponse.EndArray();

src/llm/apis/openai_responses.cpp:676

  • serializeUnaryResponse(VLMDecodedResults&, textResponse) skips adding any ParsedOutput when textResponse is empty, so serializeUnaryResponseImpl() is called with an empty vector. If the model output is legitimately empty, this likely produces a response without any output items/content, which is invalid for the Responses API. Consider emitting a single empty ParsedOutput (or otherwise ensuring one output item exists) even when textResponse is empty.
    std::vector<ParsedOutput> parsedOutputs;
    if (!textResponse.empty()) {
        if (outputParser != nullptr) {
            // Same workaround as in chat completions
            auto generatedTokens = encodeTextToTokens(textResponse);
            parsedOutputs.push_back(parseOutputIfNeeded(generatedTokens));
        } else {
            // Fast path: no output parser, use decoded text directly.
            ParsedOutput output;
            output.content = textResponse;
            parsedOutputs.push_back(std::move(output));
        }
    }
    return serializeUnaryResponseImpl(parsedOutputs);

Comment thread src/llm/io_processing/gemma4/gemma4_tool_parser.cpp
Comment on lines +39 to +67
std::string Gemma4ToolParser::parseArrayParameter(std::string argumentStr) {
size_t pos = 1;
std::string parsedArguments = "[";

while (pos != std::string::npos) {
size_t stringStartPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, pos);
if (stringStartPos == std::string::npos) {
break;
}
stringStartPos += TOOL_ARGS_STRING_INDICATOR.size();
size_t stringEndPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, stringStartPos);
if (stringEndPos == std::string::npos) {
break;
}

std::string originalStr = argumentStr.substr(stringStartPos, stringEndPos - stringStartPos);
size_t quotePos = 0;
while ((quotePos = originalStr.find('\"', quotePos)) != std::string::npos) {
originalStr.insert(quotePos, "\\");
quotePos += 2;
}
parsedArguments += "\"" + originalStr + "\",";

pos = stringEndPos + TOOL_ARGS_STRING_INDICATOR.size() + 1;
}

parsedArguments.back() = ']';

return parsedArguments;
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment on lines +38 to +62
void writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) {
if (arg.IsString()) {
writer.String(arg.GetString());
} else if (arg.IsInt64()) {
writer.Int64(arg.GetInt64());
} else if (arg.IsDouble()) {
writer.Double(arg.GetDouble());
} else if (arg.IsBool()) {
writer.Bool(arg.GetBool());
} else if (arg.IsArray()) {
writer.StartArray();
for (auto& elem : arg.GetArray()) {
writeArgumentOfAnyType(elem, writer);
}
writer.EndArray();
} else if (arg.IsObject()) {
writer.StartObject();
for (auto it = arg.MemberBegin(); it != arg.MemberEnd(); ++it) {
writer.Key(it->name.GetString());
writeArgumentOfAnyType(it->value, writer);
}
writer.EndObject();
} else {
writer.String("");
}
Comment on lines +356 to +386
std::optional<rapidjson::Document> Gemma4ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) {
if (chunk.empty()) {
return std::nullopt;
}

this->streamingContent += chunk;

if (parseNewContent()) {
if (this->currentState == State::ToolCallParameters) {
return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex);
}
if (this->currentState == State::ToolCallEnded) {
return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex);
}
if (this->currentState == State::Content) {
size_t contentEnd = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition);
std::string content;
if (contentEnd != std::string::npos) {
content = this->streamingContent.substr(this->streamingPosition, contentEnd - this->streamingPosition);
} else {
content = this->streamingContent.substr(this->streamingPosition);
}
this->streamingPosition += content.size();
if (!content.empty()) {
return wrapDeltaContent(content);
}
}
if (this->currentState == State::AfterToolCall) {
this->currentState = State::Content;
}
}
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp
Comment thread prepare_llm_models.sh Outdated
@dtrawins dtrawins added this to the 2026.2_rc milestone May 8, 2026
Comment thread src/llm/io_processing/gemma4/reasoning_parser.cpp Outdated
Comment thread src/llm/io_processing/gemma4/reasoning_parser.hpp
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment thread src/llm/io_processing/gemma4/gemma4_tool_parser.cpp
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated
Comment thread src/llm/io_processing/output_parser.cpp
Comment thread src/llm/io_processing/output_parser.cpp Outdated
Comment thread src/test/http_openai_handler_test.cpp Outdated
Comment thread src/llm/visual_language_model/legacy/servable.cpp
int singleQuoteDepth = 0;

for (size_t i = startPos; i < str.length(); ++i) {
if (bracketDepth == 0 && braceDepth == 0 && quoteDepth == 0 && singleQuoteDepth == 0 &&
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it also used by other parsers? we should also run bfcl for other parsers that use it so we are sure there is no regression (i think lfm uses it?)

@przepeck przepeck requested a review from mzegla May 11, 2026 08:12
Comment thread src/llm/visual_language_model/legacy/servable.cpp Outdated
Comment thread src/llm/io_processing/gemma4/reasoning_parser.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants