fix(llm): auto-shape multimodal mediaPath messages in chat template#1089
Merged
msluszniak merged 1 commit intomainfrom Apr 22, 2026
Merged
fix(llm): auto-shape multimodal mediaPath messages in chat template#1089msluszniak merged 1 commit intomainfrom
msluszniak merged 1 commit intomainfrom
Conversation
12 tasks
barhanc
approved these changes
Apr 22, 2026
msluszniak
added a commit
that referenced
this pull request
Apr 22, 2026
## Description
`LLMController.forward` passed `imagePaths` straight through to
`nativeModule.generateMultimodal` with no normalization. The native side
requires the `file://` prefix; without it, native throws `"Read image
error: invalid argument"` with no further context. Callers can plausibly
arrive with either form:
- `ResourceFetcher.fetch` returns raw paths *without* `file://` (per its
own docstring on the `fetch` method).
- Platform image-picker APIs (e.g. `expo-image-picker`) typically return
`file:///...` URIs.
- The same path string passed to a vision module's `forward(...)` works
either way; the asymmetry between vision modules and multimodal LLM is
undocumented.
This PR normalizes each image path inside `LLMController.forward` so
both forms work, and updates the JSDoc on `Message.mediaPath` and
`LLMModule.forward.imagePaths` to document the new contract.
### Introduces a breaking change?
- [ ] Yes
- [x] No
Strictly additive: previously-working calls (paths with `file://`) keep
working unchanged. Previously-failing calls (paths without `file://`)
now succeed.
### Type of change
- [x] Bug fix (change which fixes an issue)
- [ ] New feature
- [ ] Documentation update
- [ ] Other
### Tested on
- [ ] iOS
- [ ] Android
The bare-path failure was reproduced on Android (Samsung Galaxy S24
Ultra) with LFM2-VL-1.6B while building a downstream consumer; both
forms tested manually post-fix on the same device. Re-verification of
both forms on iOS is recommended.
### Testing instructions
```ts
import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
const llm = await LLMModule.fromModelName(LFM2_VL_1_6B_QUANTIZED);
// Both should now work; previously only the first did.
await llm.generate([
{ role: 'user', content: 'Describe.', mediaPath: 'file:///absolute/path/to/img.jpg' },
]);
await llm.generate([
{ role: 'user', content: 'Describe.', mediaPath: '/absolute/path/to/img.jpg' },
]);
```
### Related issues
Addresses item 3 of #1086.
### Checklist
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings
### Additional notes
The normalizer is module-scope (matching `messagesForChatTemplate` from
#1089) rather than a class method because it doesn't depend on
controller state.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLMController.generate() collected imagePaths from messages with a
mediaPath but did not transform their content into the array form
([{type:'image'}, {type:'text', text}]) that the chat template needs
to emit the image placeholder. Calling generate() directly with a
vision-capable model thus threw "More images paths provided than
'<image>' placeholders in prompt" from native. sendMessage() worked
because it built its own historyForTemplate that did the transformation.
Move the transformation into applyChatTemplate so both call sites get
correct behavior, and remove the now-redundant historyForTemplate block
from sendMessage. Public Message.content type unchanged; external
callers always pass plain strings, the controller handles the array
form internally.
Refs #1086 (items 1 and 2 — with item 1 fixed, item 2's type mismatch
no longer surfaces because external callers never need to construct
the array form themselves).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
61874ea to
d62a91e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
LLMController.generate()collectedimagePathsfrom messages with amediaPathset, but never transformed theircontentinto the[{type:'image'}, {type:'text', text}]form that the chat template needs to emit the<image>placeholder. Callinggenerate()directly with a vision-capable model (e.g. LFM2-VL) thus threw"More images paths provided than '<image>' placeholders in prompt"from native, even thoughsendMessage()worked because it built its ownhistoryForTemplatethat did the transformation.This PR moves the transformation into
applyChatTemplateso both call sites (generateandsendMessage) get the correct behavior, and removes the now-redundanthistoryForTemplateblock fromsendMessage. The publicMessage.contenttype staysstring— external callers always pass plain strings; the controller handles the structured array form internally.The helper is idempotent: messages whose
contentis already an array (e.g. callers who pre-shaped it as a workaround) are passed through unchanged.Introduces a breaking change?
Public types are unchanged.
sendMessageproduces an identical rendered chat-template string (the transformation just happens one step later in the pipeline; token count and rendered output are byte-identical).generateonly changes behavior in cases that previously threw — pure bug fix.Type of change
Tested on
The original bug was reproduced on a vision-capable model (LFM2-VL-1.6B-quantized) on Android while building a downstream consumer app. Re-verification of the fix on a real device is recommended before merge — see Testing instructions below.
I have not personally re-run the failing scenario after the fix.Testing instructions
To reproduce the original bug (without this PR):
With this PR applied, the same call should succeed and return the model's description.
Regression check: a vision-capable
sendMessage(text, { imagePath })flow should continue producing identical output.Screenshots
N/A (controller change, no UI).
Related issues
Addresses items 1 and 2 of #1086. With item 1 fixed, item 2's
Message.contenttype mismatch no longer surfaces in practice because external callers never need to construct the array form themselves (theas unknown as stringworkaround that motivated #2 becomes unnecessary).Checklist
Additional notes
The
messagesForChatTemplatehelper lives at module scope rather than as a static class method because it doesn't depend on controller state. Internalany[]return is a deliberate concession to the dynamic shape the chat-template engine accepts; the publicMessage[]input/output contract stays well-typed.