Skip to content

ggml-cpu: Enable tiled matmul on AIX#25199

Open
shalinib-ibm wants to merge 1 commit into
ggml-org:masterfrom
shalinib-ibm:patch-2
Open

ggml-cpu: Enable tiled matmul on AIX#25199
shalinib-ibm wants to merge 1 commit into
ggml-org:masterfrom
shalinib-ibm:patch-2

Conversation

@shalinib-ibm

Copy link
Copy Markdown
Contributor

The matmul_tiled path uses large local stack buffers for A_pack and B_pack. On AIX this can trigger a segmentation fault, so reduce the buffer footprint there to keep the tiled path usable.

Performance Impact:
~ 2x gains in PP_Speed for FP32, Q4_0 and Q8_0 models tested with llama-bench, llama-batched-bench and llama-cli.
Models used: Llama3.2 3b Instruct F32, qwen 2.5 3b Q4_0 and Q8_0

Overview

Additional information

Requirements

@shalinib-ibm shalinib-ibm requested a review from ggerganov as a code owner July 1, 2026 12:17
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 1, 2026
The matmul_tiled path uses large local stack buffers for A_pack and B_pack. On AIX this can trigger a segmentation fault, so reduce the buffer footprint there to keep the tiled path usable.

 Performance Impact:
    ~ 2x gains in PP_Speed for FP32, Q4_0 and Q8_0 models tested with llama-bench, llama-batched-bench and llama-cli.
    Models used: Llama3.2 3b Instruct F32, qwen 2.5 3b Q4_0 and Q8_0

Update sgemm.cpp

Update sgemm.cpp
@shalinib-ibm

Copy link
Copy Markdown
Contributor Author

@taronaeo @ggerganov Can you please help review this PR ?

}
if (n_aligned > 0) {
if (n_aligned % 64 == 0) nc = 64;
if (n_aligned % n_chunk == 0) nc = n_chunk;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (n_aligned % n_chunk == 0) nc = n_chunk;
if (n_aligned % n_chunk == 0) nc = n_chunk;

Code-align

}

void matmul(int64_t m, int64_t n) {
int64_t mc = 64; int64_t nc = 64; int64_t kc = 64;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this styling. Does clang-format keep this styling or does it expand into 3 separate lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants