Add Apple Metal backend support by ArturSkowronski · Pull Request #103 · beehive-lab/GPULlama3.java

ArturSkowronski · 2026-04-07T21:20:55Z

Add Apple Metal backend support to the llama-tornado launcher, enabling GPULlama3 to run on macOS with TornadoVM's native Metal driver (shipped in TornadoVM 4.0+).

Changes (launcher only, no Java code changes):

Add METAL variant to the Backend enum
Add --metal CLI flag for backend selection
Configure Metal-specific module path (tornado.drivers.metal) and export list (metal-exports)

The TornadoVM API is backend-agnostic, so the Java inference code works without modification - only the launcher needed updating.

Motivation

TornadoVM 4.0 shipped a Metal backend (PR #796), but llama-tornado only supported --opencl and --ptx. The GPULlama3 README already notes:

"TornadoVM does not have a Metal backend yet [...] until we add a Metal backend to TornadoVM and start optimizing it."

TornadoVM 4.0 has now added it - this PR enables GPULlama3 to use it 😊

Benchmark Results

Tested on Apple M1 Pro (macOS, ARM64) with Llama-3.2-1B-Instruct-f16.gguf.

LLM Inference (GPULlama3)

Backend	TornadoVM	JDK	tok/s	Tokens	Time (s)
OpenCL	2.2.0	21 (GraalVM CE)	6.35	47	7.40
OpenCL	3.0.0	25 (Temurin)	6.87	47	6.84
OpenCL	4.0.0	21 (GraalVM CE)	6.48	57	8.79
Metal	4.0.0	21 (GraalVM CE)	0.23	44	189.85

VectorAdd (10M elements)

Backend	TornadoVM	Best (ms)	Throughput (GB/s)
OpenCL	2.2.0	2.704	41.34
OpenCL	3.0.0	2.427	46.04
OpenCL	4.0.0	2.695	41.47
Metal	4.0.0	2.392	46.72

Analysis

VectorAdd: Metal is competitive with OpenCL (~46 GB/s), slightly faster on simple parallel kernels. This matches expectations - the Metal backend handles straightforward array operations well.

LLM inference: Metal is ~28x slower than OpenCL (0.23 vs 6.48 tok/s). This is consistent with the known state of the Metal backend:

TornadoVM PR #796 notes that MSL-specific optimizations (threadgroup memory, SIMD shuffle, async copies) are not yet implemented
The Metal backend test pass rate does not yet match OpenCL/PTX/SPIR-V
LLM inference involves complex control flow and frequent CPU ↔ GPU transfers - precisely the workloads where the immature Metal backend struggles

Add --metal flag to enable running GPULlama3 with TornadoVM's Metal backend on macOS. This requires TornadoVM 4.0+ which ships the Metal driver (tornado.drivers.metal). Tested on Apple M1 Pro with TornadoVM 4.0.0-jdk21 Metal SDK.

CLAassistant · 2026-04-07T21:21:25Z

All committers have signed the CLA.

ArturSkowronski · 2026-04-07T21:36:34Z

Hey @mikepapadim - sharing my numbers and analysis 😁 I wanted to test it for the new JVM Weekly, here are my results.

I will update also my repo with new TornadoVM: https://github.com/ArturSkowronski/conference-jvm-in-age-ai-2026

mikepapadim · 2026-04-08T08:35:24Z

Hello @ArturSkowronski , thank you for your contribution! Thats great actually.

Can you let me know which models you tested with the metal backend?
I need to add into the CI workflow otherwise I am quite happy to merge.

Also, can you please sign the CLA?

ArturSkowronski · 2026-04-08T13:25:01Z

@mikepapadim - Here you will find the whole "benchmark" I use 😊

ArturSkowronski/conference-jvm-in-age-ai-2026#13

Model under test from my side: Llama-3.2-1B-Instruct-f16.gguf

stratika · 2026-04-16T14:05:41Z

hi @ArturSkowronski, it seems that the F16 models on metal do not produce correct response. Can you please confirm your OS version? I am currently on Tahoe 26.3.1 (Apple M4).

I am testing with:

./llama-tornado --gpu --metal --model /opt/models/Llama-3.2-1B-Instruct-F16.gguf --prompt "Tell me a joke"

Besides that, please sync with the latest main and update the pom file to use TornadoVM v4.0.0 to be sure that it is built with it.

Then I think your changes are very good and I confirm that they work with the Q8 models. So, we can merge the RP.

mikepapadim self-requested a review April 7, 2026 21:27

ArturSkowronski changed the title ~~feat: add Apple Metal backend support~~ Add Apple Metal backend support Apr 7, 2026

mikepapadim marked this pull request as ready for review April 8, 2026 08:35

mikepapadim requested a review from orionpapadakis April 8, 2026 08:35

mikepapadim assigned ArturSkowronski Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Apple Metal backend support#103

Add Apple Metal backend support#103
ArturSkowronski wants to merge 1 commit intobeehive-lab:mainfrom
ArturSkowronski:feat/metal-backend-support

ArturSkowronski commented Apr 7, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 7, 2026 •

edited

Loading

Uh oh!

ArturSkowronski commented Apr 7, 2026 •

edited

Loading

Uh oh!

mikepapadim commented Apr 8, 2026 •

edited

Loading

Uh oh!

ArturSkowronski commented Apr 8, 2026

Uh oh!

stratika commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ArturSkowronski commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Benchmark Results

LLM Inference (GPULlama3)

VectorAdd (10M elements)

Analysis

Uh oh!

CLAassistant commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArturSkowronski commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikepapadim commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArturSkowronski commented Apr 8, 2026

Uh oh!

stratika commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ArturSkowronski commented Apr 7, 2026 •

edited

Loading

CLAassistant commented Apr 7, 2026 •

edited

Loading

ArturSkowronski commented Apr 7, 2026 •

edited

Loading

mikepapadim commented Apr 8, 2026 •

edited

Loading