sycl: simplify bin_bcast_kernel #13383

AD2605 · 2025-05-08T10:41:02Z

This PR simplifies the bin-bcast kernel, and adds a special code-path when the all the inputs are contiguous, thus avoiding the need for unnecessary index calculations.

The current bin-bcast launches a 3D grid or a 1D grid, and the former often limiting in the number of workitems it can accomodate.

This PR completely flattens the kernel, which also makes it easier to check for contiguous memory accesses, and the separate contiguous path also opens the possibility of vectorization later on, though in my current testing, it did not prove to bring about meaningful difference to performance.

This PR also bring minor but consistent improvement of around 1 tk/s on some models.

Performance compared with the following parameters (with -mmp 0 -ngl 99 -t 8)

Intel Lunar Lake 140V iGPU

Model	This PR (`5a0e7a9` ) tk/s	Master (`814f795`) tk / s
qwen2 1.5B Q4_0	34.05 ± 0.53	33.39 ± 0.52
gemma2 2B Q4_K	25.00 ± 0.25	24.74 ± 0.21
llama 8B Q4_K - Medium	11.10 ± 1.27	10.2 ± 0.70

Intel Data Center Max 1100

Model	This PR (`5a0e7a9` ) tk/s	Master (`814f795`) tk / s
qwen2 1.5B Q4_0	95.06 ± 0.76	92.51 ± 5.88
gemma2 2B Q4_K	84.62 ± 0.18	82.44 ± 0.22
llama 8B Q4_K - Medium	36.72 ± 0.05	36.35 ± 0.07

ggml/src/ggml-sycl/binbcast.cpp

s-Nick · 2025-05-12T08:42:13Z

ggml/src/ggml-sycl/binbcast.cpp

+
+        std::size_t num_dst_elements = static_cast<std::size_t>(ne0) * static_cast<std::size_t>(ne1) *
+                                       static_cast<std::size_t>(ne2) * static_cast<std::size_t>(ne3);
+        std::size_t local_range  = 256;


is there a specific reason for using 256 and not something else like 128?

No reason, just a high enough default value.

Rbiessy

LGTM!

ggml/src/ggml-sycl/binbcast.cpp

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 8, 2025

s-Nick reviewed May 12, 2025

View reviewed changes

ggml/src/ggml-sycl/binbcast.cpp Outdated Show resolved Hide resolved

s-Nick reviewed May 12, 2025

View reviewed changes

Rbiessy reviewed May 14, 2025

View reviewed changes

ggml/src/ggml-sycl/binbcast.cpp Outdated Show resolved Hide resolved

sycl: simplify bin_bcast_kernel

6c2091f

AD2605 force-pushed the ad/simplify_binbcast branch from 0aa9325 to 6c2091f Compare May 14, 2025 16:20

Rbiessy approved these changes May 14, 2025

View reviewed changes

qnixsynapse approved these changes May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: simplify bin_bcast_kernel #13383

sycl: simplify bin_bcast_kernel #13383

AD2605 commented May 8, 2025

s-Nick May 12, 2025

AD2605 May 14, 2025

Rbiessy left a comment

sycl: simplify bin_bcast_kernel #13383

Are you sure you want to change the base?

sycl: simplify bin_bcast_kernel #13383

Conversation

AD2605 commented May 8, 2025

s-Nick May 12, 2025

Choose a reason for hiding this comment

AD2605 May 14, 2025

Choose a reason for hiding this comment

Rbiessy left a comment

Choose a reason for hiding this comment