Add Q4_K/Q5_K/Q6_K GPU support via Q8_0 dequantization by AdamBien · Pull Request #108 · beehive-lab/GPULlama3.java

AdamBien · 2026-04-19T10:09:07Z

Add GPU support for K-quant models (Q4_K_M, Q5_K_M, Q6_K) via load-time dequantization to Q8_0
New FloatTensor implementations: Q4_KFloatTensor, Q5_KFloatTensor, Q6_KFloatTensor
Dequantization correctly handles TornadoVM's 16-byte ARRAY_HEADER memory layout
Centralize weight loading log message in AbstractModelLoader (shows actual model quantization, e.g. "Q4_K_M -> Q8_0")

Tested with:
./llamaTornado --gpu --verbose-init --metal --model /Users/abien/work/workspaces/llms/Devstral-Small-2-24B-Instruct-2512-Q4_K_M.gguf --prompt "who are you?" --gpu-memory 30GB

CLAassistant · 2026-04-19T10:09:14Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

AdamBien added 2 commits April 19, 2026 12:01

additional information / output added

789d36a

support for Q4 quantization added

58f7e2a

mikepapadim requested review from kotselidis, mairooni, mikepapadim, orionpapadakis and stratika and removed request for orionpapadakis April 19, 2026 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Q4_K/Q5_K/Q6_K GPU support via Q8_0 dequantization#108

Add Q4_K/Q5_K/Q6_K GPU support via Q8_0 dequantization#108
AdamBien wants to merge 2 commits intobeehive-lab:mainfrom
AdamBien:main

AdamBien commented Apr 19, 2026

Uh oh!

CLAassistant commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdamBien commented Apr 19, 2026

Uh oh!

CLAassistant commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants