Skip to content

fix(dotAI): QA fixes - image model default, playground overflow, embeddings host resolution#35716

Queued
ihoffmann-dot wants to merge 9 commits into
mainfrom
dot-ai-langchain-qa-fixes
Queued

fix(dotAI): QA fixes - image model default, playground overflow, embeddings host resolution#35716
ihoffmann-dot wants to merge 9 commits into
mainfrom
dot-ai-langchain-qa-fixes

Conversation

@ihoffmann-dot
Copy link
Copy Markdown
Member

@ihoffmann-dot ihoffmann-dot commented May 14, 2026

Summary

Bug fixes found during QA testing of the dotAI integration.

  • Default image model: update the example providerConfig JSON in the dotAI config UI from dall-e-3 to gpt-image-1
  • Image Playground overflow: fix JSON.stringify(json, 2) (wrong signature — JSON was not pretty-printed) and add overflow-wrap, word-break, and <pre> wrapping so the response JSON no longer overflows horizontally; add max-height + overflow-y: auto on the <pre> to prevent vertical overflow
  • Embeddings host resolution: EmbeddingContentListener and BulkEmbeddingsRunner were calling getEmbeddingsAPI() without a host, causing EmbeddingsAPIImpl to always fall back to System Host config — indexing silently failed with "no provider config" when dotAI was configured on a specific site only
  • Search form validation: form submitted even when required fields were empty; added reportValidity() check before firing the request and required attributes on the index select and prompt textarea
  • Model dropdown empty: writeModelToDropdown crashed silently due to a wrong variable reference (indexName instead of modelName) and read availableModels which no longer exists in the config response; rewritten to parse providerConfig.chat.model and populate one option per configured model
  • Missing index error: /api/v1/ai/search returned { dotCMSResults: [] } when a non-existent indexName was passed, with no indication the index was missing; now returns 404 with a descriptive error listing known indexes

Changes

  • dot-ai-config-detail.component.ts: dall-e-3gpt-image-1 in EXAMPLE_CONFIG
  • dotai.js: fix JSON.stringify call, add overflow styles, wrap JSON in <pre> with max-height; add reportValidity() guard in doSearchChatJson; rewrite writeModelToDropdown to read from providerConfig.chat.model
  • render.jsp: add required to index select and prompt textarea; add value="" to select placeholder option
  • EmbeddingContentListener.java: extract resolveHost(hostId) helper, pass resolved Host to getEmbeddingsAPI(host) in both addToIndexesIfNeeded and deleteFromIndexes
  • BulkEmbeddingsRunner.java: resolve host per-contentlet and pass to getEmbeddingsAPI(host) (same bug as EmbeddingContentListener, different code path — triggered by the manual "Build Index" button)
  • SearchResource.java: validate index existence via countEmbeddingsByIndex() before running the query; return 404 if not found; pass host to getEmbeddingsAPI(host)

Related Issue

This PR fixes #35730

@github-actions github-actions Bot added Area : Backend PR changes Java/Maven backend code Area : Frontend PR changes Angular/TypeScript frontend code labels May 14, 2026
@ihoffmann-dot ihoffmann-dot changed the title fix(dotAI): QA fixes — image model default, playground overflow, embeddings host resolution fix(dotAI): QA fixes - image model default, playground overflow, embeddings host resolution May 14, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Claude finished @ihoffmann-dot's task in 2m 49s —— View job


PR Review

Mostly solid bug-fixing PR. A few things worth a second look:

🔴 SearchResource.searchByPost — perf regression on every search

SearchResource.java:130-131 now calls countEmbeddingsByIndex() on every search to validate the index name. That SQL (EmbeddingsSQL.COUNT_EMBEDDINGS_BY_INDEX) is a full-table aggregation over dot_embeddings with COUNT(*), SUM, COUNT(distinct(inode)), STRING_AGG, AVG grouped by index_name. On any sizeable embeddings table this is expensive — and it runs before every legitimate search just to produce a 404 for the unlikely typo case. Consider a cheap existence probe instead (e.g. a SELECT 1 FROM dot_embeddings WHERE index_name = ? LIMIT 1) or a small cache; the heavy aggregation belongs in /info-style endpoints, not the hot search path.

Also: leaking all index names in the error body is fine for an admin-only endpoint, but worth confirming this resource isn't reachable by frontend users — the requiredFrontendUser(true) init suggests it is, in which case the message effectively enumerates available indexes to any logged-in frontend user.

🟡 LangChain4jAIClient.toImageContent — NPE risk

LangChain4jAIClient.java:411

final String url = part.optJSONObject("image_url").optString("url", "");

optJSONObject("image_url") returns null when the key is missing or is not a JSON object (e.g. a bare string "image_url": "https://…", which is a real-world shape some clients send). That NPEs the whole toMessages call. Guard with a null check and fall back to optString("image_url", "") for the string-shaped variant.

The tests test_toMultimodalUserMessage_* only cover the well-formed object shape, so this hole isn't caught.

🟡 EmbeddingContentListener.deleteFromIndexes — dead-looking call

EmbeddingContentListener.java:157

getConfigJson(contentlet.getHost());

Return value is discarded. The only effect is the implicit getAppConfigisEnabled throw inside. Either drop it or call getAppConfig(contentlet.getHost()) directly — readers will spend time wondering what this line does. Same call lower down in addToIndexesIfNeeded resolves the host a second time (once via resolveHost, once via getConfigJsongetAppConfigresolveHost); minor waste, but tidying it would localise the host lookup.

🟡 render.jsp size field — UX regression, doesn't match Kevin's ask

render.jsp:364 replaces a 3-option <select> with a readonly <input type="text" value="1024x1024">. Kevin's review comment suggested either an <input type="number"> (so users can pick sizes that fit the model) or just keeping a single square value. The chosen pattern — an input that looks editable but isn't — is the worst of both worlds. Either remove the field entirely (and have doImageJson hard-code 1024x1024) or make it actually editable.

🟡 dotai.js writeModelToDropdown — silent split semantics

dotai.js:131-148 parses providerConfig.chat.model and splits on ,. That matches the convention elsewhere, but:

  • the (default) label is now always the first split entry, regardless of what the server actually picks at request time — this can mislead users when fallback to a later model kicks in
  • dotAiState.config.providerConfig is assumed to be a string; if it's already an object somewhere (depends on how the config endpoint serialises it), JSON.parse throws and the dropdown silently stays empty (the console.error is the only signal, which the QA case earlier missed)

Worth confirming the shape of dotAiState.config.providerConfig across all config endpoints, and considering surfacing the empty-models case in the UI rather than just the console.

🟢 Pre-existing but worth noting (touched by this PR)

dotai.js:573-596 builds the result panel via innerHTML with raw ${rewrittenPrompt} and ${jsonString} interpolation. The provider response is trusted today, but this is an XSS sink waiting for the day someone wires up a less-vetted provider. Not introduced here — flagging because the PR is editing this exact template.

🟢 Clean

  • BulkEmbeddingsRunner host resolution + integration test exercising the site-only-config path: good fix, good regression coverage.
  • LangChain4jAIClient.toMessages multimodal handling is the right shape; tests confirm the array isn't stringified.
  • dot-ai-config-detail.component.ts dall-e-3gpt-image-1: trivially correct.

  • Read PR diff vs origin/main
  • Review backend Java changes
  • Review frontend changes
  • Review tests
  • Post review feedback
    Branch

Comment thread dotCMS/src/main/java/com/dotcms/ai/api/BulkEmbeddingsRunner.java
Copy link
Copy Markdown
Member

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

Comment thread dotCMS/src/main/webapp/html/portlet/ext/dotai/dotai.js
Comment thread dotCMS/src/main/webapp/html/portlet/ext/dotai/dotai.js
@KevinDavilaDotCMS
Copy link
Copy Markdown
Member

I think we can use input number instead of the dropdown with hardcoded values in

                        <td>
                            <select name="size" style="min-width:400px;">
                                <option value="1024x1024">1024x1024 (Square)</option>
                                <option value="1024x1792">1024x1792 (Vertical)</option>
                                <option value="1792x1024" selected>1792x1024 (Horizontal)</option>


                            </select>
                        </td>

Because some models cannot generate in the hardcoded values

OR, since this is a sandbox, just keep square here, maybe the most supported value from models

@ihoffmann-dot ihoffmann-dot enabled auto-merge May 21, 2026 02:13
@ihoffmann-dot ihoffmann-dot added this pull request to the merge queue May 21, 2026
Any commits made after this event will not be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code Area : Frontend PR changes Angular/TypeScript frontend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Dot AI — Issues and Improvements

4 participants