Skip to content

GenAI servables - refactor input processing#4318

Open
mzegla wants to merge 7 commits into
mainfrom
servables_input_flow_refactor_1
Open

GenAI servables - refactor input processing#4318
mzegla wants to merge 7 commits into
mainfrom
servables_input_flow_refactor_1

Conversation

@mzegla

@mzegla mzegla commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@mzegla mzegla force-pushed the servables_input_flow_refactor_1 branch 4 times, most recently from f8f5106 to 6fe4461 Compare June 23, 2026 14:53
@mzegla mzegla requested a review from Copilot June 23, 2026 14:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors GenAI servable input handling to use a unified InputRequest + InputProcessor chain, moving generation-config extraction into the API handler and deferring multimodal image decoding (and related validation) out of the OpenAI request parsers.

Changes:

  • Introduces InputRequest and an InputProcessor pipeline (raw prompt extraction, chat template application, tokenization, deferred image decoding, text-content normalization).
  • Updates LM/VLM servables and executors to consume executionContext->inputRequest (and removes legacy prepareInputs overrides in VLM servables).
  • Updates OpenAI handlers/tests to preserve multimodal content arrays in ChatHistory, removes processedJson/imageHistory, and adjusts tools parsing assertions.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/test/llm/llmnode_test.cpp Updates tests to use executionContext.inputRequest.inputIds.
src/test/llm/input_processing/raw_prompt_extractor_test.cpp Adds unit tests for RawPromptExtractor.
src/test/llm/input_processing/image_decoding_processor_test.cpp Adds unit tests for ImageDecodingProcessor behavior without actual decoding.
src/test/http_openai_handler_test.cpp Updates parsing tests to assert ChatHistory preservation and tool map contents (no processedJson/imageHistory).
src/llm/visual_language_model/legacy/servable.hpp Removes VLM legacy inputText/inputImages fields and sets isVLM in processor context.
src/llm/visual_language_model/legacy/servable.cpp Switches to extractInputRequest(); removes legacy VLM prepareInputs implementation.
src/llm/visual_language_model/legacy/legacy_executor.cpp Uses inputRequest.promptText/inputImages/generationConfig for generation.
src/llm/visual_language_model/continuous_batching/servable.hpp Removes VLM CB inputText/inputImages fields and sets isVLM in processor context.
src/llm/visual_language_model/continuous_batching/servable.cpp Uses inputRequest.* when adding requests; removes legacy VLM prepareInputs.
src/llm/servable.hpp Replaces inputIds/GenerationConfigBuilder in execution context with InputRequest; adds InputProcessorContext.
src/llm/servable.cpp Refactors base parseRequest/prepareInputs to build and process InputRequest.
src/llm/servable_initializer.cpp Populates InputProcessorContext (tokenizer + optional Python template processor).
src/llm/language_model/legacy/servable.cpp Uses inputRequest for generation config and NPU input-length validation.
src/llm/language_model/legacy/legacy_executor.cpp Uses inputRequest.inputIds/generationConfig for generation.
src/llm/language_model/continuous_batching/servable.cpp Uses inputRequest for scheduler limits and pipeline add_request.
src/llm/io_processing/input_request.hpp Adds InputRequest and InputPayload variant.
src/llm/io_processing/input_processors/tokenization_processor.hpp Adds tokenization processor definition.
src/llm/io_processing/input_processors/tokenization_processor.cpp Implements tokenization into req.inputIds.
src/llm/io_processing/input_processors/text_content_normalization_processor.hpp Adds text-only content-array normalizer (LM paths).
src/llm/io_processing/input_processors/text_content_normalization_processor.cpp Implements content-array flattening to string with \\n joins.
src/llm/io_processing/input_processors/raw_prompt_extractor.hpp Adds raw prompt extractor (COMPLETIONS path).
src/llm/io_processing/input_processors/image_decoding_processor.hpp Adds deferred image decoding processor (VLM paths).
src/llm/io_processing/input_processors/image_decoding_processor.cpp Implements image decoding + <ov_genai_image_N> injection into message content.
src/llm/io_processing/input_processors/chat_template_processor.hpp Adds chat template processor (Python and native paths).
src/llm/io_processing/input_processors/chat_template_processor.cpp Implements prompt building from ChatHistory.
src/llm/io_processing/input_processor.hpp Adds orchestrator selecting processors based on config + payload variant.
src/llm/io_processing/input_processor.cpp Builds and executes the processor chain.
src/llm/io_processing/input_processor_context.hpp Adds per-deployment resources for input processing.
src/llm/io_processing/input_processing_config.hpp Adds deployment-level processing config (isVLM).
src/llm/io_processing/base_input_processor.hpp Adds base interface for processing steps.
src/llm/BUILD Adds Bazel targets/deps for new IO processing components.
src/llm/apis/openai_responses.cpp Preserves content arrays in ChatHistory and removes Python processedJson path + eager image decoding.
src/llm/apis/openai_request.hpp Removes processedJson and imageHistory from OpenAIRequest.
src/llm/apis/openai_completions.cpp Preserves multimodal content arrays in ChatHistory and removes eager image decoding + processedJson rebuild.
src/llm/apis/openai_api_handler.hpp Removes getProcessedJson/getImageHistory; adds extractInputRequest().
src/llm/apis/openai_api_handler.cpp Implements extractInputRequest() and removes processedJson mutations from tools parsing.

Comment thread src/llm/io_processing/input_processors/image_decoding_processor.cpp
Comment thread src/llm/io_processing/input_processors/image_decoding_processor.cpp
Comment thread src/llm/io_processing/input_processors/image_decoding_processor.cpp
Comment thread src/llm/servable.cpp
Comment thread src/llm/apis/openai_api_handler.cpp
Comment thread src/llm/io_processing/input_processor.cpp
@mzegla mzegla force-pushed the servables_input_flow_refactor_1 branch from 6fe4461 to f5b6037 Compare June 24, 2026 08:44
@mzegla mzegla requested a review from Copilot June 24, 2026 08:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 7 comments.

Comment thread src/llm/apis/openai_api_handler.cpp Outdated
Comment thread src/llm/io_processing/input_processor.cpp
Comment thread src/llm/io_processing/base_input_processor.hpp
Comment thread src/llm/io_processing/input_processors/tokenization_processor.hpp
Comment thread src/llm/io_processing/input_processors/tokenization_processor.hpp Outdated
Comment thread src/llm/servable.cpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 4 comments.

Comment thread src/llm/servable.hpp
Comment thread src/llm/servable.hpp
Comment thread src/llm/io_processing/input_processor_context.hpp
Comment thread src/llm/servable.cpp
@mzegla mzegla requested a review from Copilot June 25, 2026 10:56
@mzegla mzegla marked this pull request as ready for review June 25, 2026 10:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 1 comment.

Comment thread src/llm/apis/openai_completions.cpp Outdated
@mzegla mzegla requested a review from pgladkows June 25, 2026 11:54
req.input = request.chatHistory;
// Populate tools and chat_template_kwargs on the copied ChatHistory so
// ChatTemplateProcessor can access them via get_tools()/get_extra_context().
auto& chatHistory = std::get<ov::genai::ChatHistory>(req.input);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can throw an exception if req.input is not of type ChatHistory. can we try catch that and return error if it happens?

absl::Status process(InputRequest& req) override;

private:
ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick - we can do that since all constructors take tokenizer ref as argument

Suggested change
ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext
ov::genai::Tokenizer& tokenizer; // non-owning; lifetime tied to InputProcessorContext

allowedMediaDomains(std::move(allowedMediaDomains)) {}

absl::Status ImageDecodingProcessor::process(InputRequest& req) {
ov::genai::ChatHistory& chatHistory = std::get<ov::genai::ChatHistory>(req.input);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do we need try catch and return error?

absl::Status process(InputRequest& req) override;

private:
ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext
ov::genai::Tokenizer& tokenizer; // non-owning; lifetime tied to InputProcessorContext

// True when the GenAI built-in tokenizer.apply_chat_template() should be used
// even on Python-enabled builds (i.e. ChatTemplateMode::MINJA).
// False (default) uses PyJinjaTemplateProcessor when PYTHON_DISABLE==0.
bool useMinja = false;

@dkalinowski dkalinowski Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I put auto deduced caps/workarounds here?

const auto& chatTemplateKwargs = chatTemplateKwargsParsingResult.value();
if (llm_calculator_logger->should_log(spdlog::level::trace)) {
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "VLM chatHistory messages: {}", chatHistory.get_messages().to_json_string());
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "VLM chatHistory.get_tools(): {}", chatHistory.get_tools().to_json_string());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still have such logs somewhere? I think they were useful

Comment thread src/llm/servable.cpp
if (!getProperties()->inputProcessorContext.config.isVLM &&
std::holds_alternative<ov::genai::ChatHistory>(executionContext->inputRequest.input)) {
const auto& ch = std::get<ov::genai::ChatHistory>(executionContext->inputRequest.input);
for (size_t i = 0; i < ch.size(); i++) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be separate util function

Comment thread common_settings.bzl
"/wd6240",
"/wd6326",
"/wd6385",
"/wd6386",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats this warning?

TEST_P(HttpOpenAIHandlerChatAndResponsesParsingTest, ParsingImageStringWithNoMimePrefixFails) {
// Without a "data:..." prefix the URL falls through to the local-filesystem loader,
// which is disabled by default.
TEST_P(HttpOpenAIHandlerChatAndResponsesParsingTest, ParsingImageStringWithNoMimePrefixPreservedInChatHistory) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change the assert here, do we have separate test checking if these requests are rejected during other phase?

TEST_F(HttpOpenAIHandlerParsingTest, ParsingMessagesWithNoContentFieldAddsEmptyStringToChatHistory) {
// Assistant turns that carry only tool_calls legitimately omit the "content" field.
// parseMessages injects an empty-string content entry so chat templates always see
// the field. Verifies the JsonContainer proxy write-through: lastMessage["content"] = ""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of "workarounds" llama.cpp and minja applies. I see we already have it - for all models without autodetection. Is it new behavior or you added this in this PR? If so, which processor handles that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants