Skip to content

fix(llm): auto-shape multimodal mediaPath messages in chat template#1089

Merged
msluszniak merged 1 commit intomainfrom
fix/generate-multimodal-content-shaping
Apr 22, 2026
Merged

fix(llm): auto-shape multimodal mediaPath messages in chat template#1089
msluszniak merged 1 commit intomainfrom
fix/generate-multimodal-content-shaping

Conversation

@msluszniak
Copy link
Copy Markdown
Member

@msluszniak msluszniak commented Apr 22, 2026

Description

LLMController.generate() collected imagePaths from messages with a mediaPath set, but never transformed their content into the [{type:'image'}, {type:'text', text}] form that the chat template needs to emit the <image> placeholder. Calling generate() directly with a vision-capable model (e.g. LFM2-VL) thus threw "More images paths provided than '<image>' placeholders in prompt" from native, even though sendMessage() worked because it built its own historyForTemplate that did the transformation.

This PR moves the transformation into applyChatTemplate so both call sites (generate and sendMessage) get the correct behavior, and removes the now-redundant historyForTemplate block from sendMessage. The public Message.content type stays string — external callers always pass plain strings; the controller handles the structured array form internally.

The helper is idempotent: messages whose content is already an array (e.g. callers who pre-shaped it as a workaround) are passed through unchanged.

Introduces a breaking change?

  • Yes
  • No

Public types are unchanged. sendMessage produces an identical rendered chat-template string (the transformation just happens one step later in the pipeline; token count and rendered output are byte-identical). generate only changes behavior in cases that previously threw — pure bug fix.

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

The original bug was reproduced on a vision-capable model (LFM2-VL-1.6B-quantized) on Android while building a downstream consumer app. Re-verification of the fix on a real device is recommended before merge — see Testing instructions below. I have not personally re-run the failing scenario after the fix.

Testing instructions

To reproduce the original bug (without this PR):

import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';

const llm = await LLMModule.fromModelName(LFM2_VL_1_6B_QUANTIZED);
await llm.generate([
  { role: 'user', content: 'Describe this image.', mediaPath: 'file:///path/to/image.jpg' },
]);
// Throws: "More images paths provided than '<image>' placeholders in prompt"

With this PR applied, the same call should succeed and return the model's description.

Regression check: a vision-capable sendMessage(text, { imagePath }) flow should continue producing identical output.

Screenshots

N/A (controller change, no UI).

Related issues

Addresses items 1 and 2 of #1086. With item 1 fixed, item 2's Message.content type mismatch no longer surfaces in practice because external callers never need to construct the array form themselves (the as unknown as string workaround that motivated #2 becomes unnecessary).

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

The messagesForChatTemplate helper lives at module scope rather than as a static class method because it doesn't depend on controller state. Internal any[] return is a deliberate concession to the dynamic shape the chat-template engine accepts; the public Message[] input/output contract stays well-typed.

@msluszniak msluszniak self-assigned this Apr 22, 2026
@msluszniak msluszniak added the bug fix PRs that are fixing bugs label Apr 22, 2026
@msluszniak msluszniak requested a review from barhanc April 22, 2026 13:08
msluszniak added a commit that referenced this pull request Apr 22, 2026
## Description

`LLMController.forward` passed `imagePaths` straight through to
`nativeModule.generateMultimodal` with no normalization. The native side
requires the `file://` prefix; without it, native throws `"Read image
error: invalid argument"` with no further context. Callers can plausibly
arrive with either form:

- `ResourceFetcher.fetch` returns raw paths *without* `file://` (per its
own docstring on the `fetch` method).
- Platform image-picker APIs (e.g. `expo-image-picker`) typically return
`file:///...` URIs.
- The same path string passed to a vision module's `forward(...)` works
either way; the asymmetry between vision modules and multimodal LLM is
undocumented.

This PR normalizes each image path inside `LLMController.forward` so
both forms work, and updates the JSDoc on `Message.mediaPath` and
`LLMModule.forward.imagePaths` to document the new contract.

### Introduces a breaking change?

- [ ] Yes
- [x] No

Strictly additive: previously-working calls (paths with `file://`) keep
working unchanged. Previously-failing calls (paths without `file://`)
now succeed.

### Type of change

- [x] Bug fix (change which fixes an issue)
- [ ] New feature
- [ ] Documentation update
- [ ] Other

### Tested on

- [ ] iOS
- [ ] Android

The bare-path failure was reproduced on Android (Samsung Galaxy S24
Ultra) with LFM2-VL-1.6B while building a downstream consumer; both
forms tested manually post-fix on the same device. Re-verification of
both forms on iOS is recommended.

### Testing instructions

```ts
import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
const llm = await LLMModule.fromModelName(LFM2_VL_1_6B_QUANTIZED);

// Both should now work; previously only the first did.
await llm.generate([
  { role: 'user', content: 'Describe.', mediaPath: 'file:///absolute/path/to/img.jpg' },
]);
await llm.generate([
  { role: 'user', content: 'Describe.', mediaPath: '/absolute/path/to/img.jpg' },
]);
```

### Related issues

Addresses item 3 of #1086.

### Checklist

- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

The normalizer is module-scope (matching `messagesForChatTemplate` from
#1089) rather than a class method because it doesn't depend on
controller state.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLMController.generate() collected imagePaths from messages with a
mediaPath but did not transform their content into the array form
([{type:'image'}, {type:'text', text}]) that the chat template needs
to emit the image placeholder. Calling generate() directly with a
vision-capable model thus threw "More images paths provided than
'<image>' placeholders in prompt" from native. sendMessage() worked
because it built its own historyForTemplate that did the transformation.

Move the transformation into applyChatTemplate so both call sites get
correct behavior, and remove the now-redundant historyForTemplate block
from sendMessage. Public Message.content type unchanged; external
callers always pass plain strings, the controller handles the array
form internally.

Refs #1086 (items 1 and 2 — with item 1 fixed, item 2's type mismatch
no longer surfaces because external callers never need to construct
the array form themselves).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msluszniak msluszniak force-pushed the fix/generate-multimodal-content-shaping branch from 61874ea to d62a91e Compare April 22, 2026 15:41
@msluszniak msluszniak merged commit b271aa6 into main Apr 22, 2026
4 of 5 checks passed
@msluszniak msluszniak deleted the fix/generate-multimodal-content-shaping branch April 22, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix PRs that are fixing bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants