ggml-cpu: Enable tiled matmul on AIX by shalinib-ibm · Pull Request #25199 · ggml-org/llama.cpp

shalinib-ibm · 2026-07-01T12:17:39Z

The matmul_tiled path uses large local stack buffers for A_pack and B_pack. On AIX this can trigger a segmentation fault, so reduce the buffer footprint there to keep the tiled path usable.

Performance Impact:
~ 2x gains in PP_Speed for FP32, Q4_0 and Q8_0 models tested with llama-bench, llama-batched-bench and llama-cli.
Models used: Llama3.2 3b Instruct F32, qwen 2.5 3b Q4_0 and Q8_0

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

The matmul_tiled path uses large local stack buffers for A_pack and B_pack. On AIX this can trigger a segmentation fault, so reduce the buffer footprint there to keep the tiled path usable. Performance Impact: ~ 2x gains in PP_Speed for FP32, Q4_0 and Q8_0 models tested with llama-bench, llama-batched-bench and llama-cli. Models used: Llama3.2 3b Instruct F32, qwen 2.5 3b Q4_0 and Q8_0 Update sgemm.cpp Update sgemm.cpp

shalinib-ibm · 2026-07-01T13:17:03Z

@taronaeo @ggerganov Can you please help review this PR ?

taronaeo · 2026-07-02T11:54:04Z

        }
        if (n_aligned > 0) {
-            if (n_aligned % 64 == 0)      nc = 64;
+            if (n_aligned % n_chunk == 0)      nc = n_chunk;


Suggested change

if (n_aligned % n_chunk == 0) nc = n_chunk;

if (n_aligned % n_chunk == 0) nc = n_chunk;

Code-align

taronaeo · 2026-07-02T11:54:57Z

    }

    void matmul(int64_t m, int64_t n) {
+        int64_t mc = 64; int64_t nc = 64; int64_t kc = 64;


I'm not sure about this styling. Does clang-format keep this styling or does it expand into 3 separate lines?

shalinib-ibm requested a review from ggerganov as a code owner July 1, 2026 12:17

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 1, 2026

shalinib-ibm force-pushed the patch-2 branch from cd3e655 to c1c43a9 Compare July 1, 2026 12:46

taronaeo approved these changes Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: Enable tiled matmul on AIX#25199

ggml-cpu: Enable tiled matmul on AIX#25199
shalinib-ibm wants to merge 1 commit into
ggml-org:masterfrom
shalinib-ibm:patch-2

shalinib-ibm commented Jul 1, 2026

Uh oh!

shalinib-ibm commented Jul 1, 2026

Uh oh!

taronaeo Jul 2, 2026

Uh oh!

taronaeo Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (n_aligned % n_chunk == 0) nc = n_chunk;
	if (n_aligned % n_chunk == 0) nc = n_chunk;

Uh oh!

Conversation

shalinib-ibm commented Jul 1, 2026

Overview

Additional information

Requirements

Uh oh!

shalinib-ibm commented Jul 1, 2026

Uh oh!

taronaeo Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

taronaeo Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants