Llava15ChatHandler - Cache Image Encoding

**Is your feature request related to a problem? Please describe.**
In multi-turn conversations that include images, the image encoding is performed on each turn. This slows down inference considerably, especially when running with pure CPU.

**Describe the solution you'd like**
Once an image has been encoded, the result can be reused, rather than re-encoding every turn.

**Describe alternatives you've considered**
Storing a detailed description of the image in the conversation history, and popping out the image_url part. This is not as flexible however, if wanting to query about something specfic later on that wasn't captured in the saved description.

**Additional context**
I have implemented a solution locally editing the source code (with some heavy AI assistance, as I am not an expert on low level LLM coding) and seem to have this feature working, by storing chunk signatures instead of immediately processing and manipulating the kv_cache_seq_rm, but not really sure if its a safe/sustainable approach. 

I could share the source code or raise a draft PR if it'd be useful.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llava15ChatHandler - Cache Image Encoding #2222

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Llava15ChatHandler - Cache Image Encoding #2222

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions