-
Notifications
You must be signed in to change notification settings - Fork 20.1k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
server: add --no-sleep flag for GPU heartbeat on headless GPUs
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
server
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
Vulkan
Issues specific to the Vulkan backend
#25214
opened Jul 1, 2026 by
johnkarlhill
Loading…
rocm: fix mmap loading of large models
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
#25212
opened Jul 1, 2026 by
pwilkin
Member
Loading…
Optimize RWKV7 inference by fusing some graph operators
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
#25206
opened Jul 1, 2026 by
MollySophia
Collaborator
•
Draft
sycl: add GGML_SYCL_FATTN_VEC_NTHREADS build option
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25205
opened Jul 1, 2026 by
Titaniumtown
Loading…
llama: fix quantized kv-cache for dsv4
model
Model specific
#25202
opened Jul 1, 2026 by
am17an
Contributor
Loading…
llama-cli: fix passing chat_template_kwargs and reasoning_format params
examples
#25201
opened Jul 1, 2026 by
percontation
Contributor
Loading…
ggml-cpu: Enable tiled matmul on AIX
ggml
changes relating to the ggml tensor library for machine learning
#25199
opened Jul 1, 2026 by
shalinib-ibm
Contributor
Loading…
vulkan: disable async transfer queue on amdvlk (mitigate MoE partial-offload crash)
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#25196
opened Jul 1, 2026 by
liminfei-amd
Contributor
Loading…
1 task done
vulkan: Remove crash guard for Intel GPU
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
openvino: fix SWA mask detection for long prompts
ggml
changes relating to the ggml tensor library for machine learning
OpenVINO
#25189
opened Jul 1, 2026 by
zlma7001
Loading…
tests: Source-level separation between llama.cpp and ggml
testing
Everything test related
#25179
opened Jun 30, 2026 by
ckastner
Collaborator
Loading…
metal: add col2im_1d op (f32/f16/bf16)
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#25176
opened Jun 30, 2026 by
ServeurpersoCom
Contributor
Loading…
spec: add DSpark speculative decoding
conversion
model
Model specific
testing
Everything test related
#25173
opened Jun 30, 2026 by
wjinxu
Loading…
grammar : recognize '|' at start of continuation line
testing
Everything test related
#25170
opened Jun 30, 2026 by
o7si
Contributor
Loading…
Add support for Laguna XS.2 & M.1
conversion
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
testing
Everything test related
#25165
opened Jun 30, 2026 by
joerowell
Loading…
ggml : fix wrong transpose function for int16 data
ggml
changes relating to the ggml tensor library for machine learning
#25161
opened Jun 30, 2026 by
I3eg1nner
Loading…
ggml: imatrix-aware NVFP4 quantization (scale search) + wire NVFP4 ftype
examples
ggml
changes relating to the ggml tensor library for machine learning
#25153
opened Jun 30, 2026 by
avifenesh
Loading…
common, server : preserve HF file for cached models
server
#25152
opened Jun 29, 2026 by
mrexodia
Loading…
CUDA: add COL2IM_1D op
CUDA
Related to the CUDA backend
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
#25151
opened Jun 29, 2026 by
Ssamdeman
Loading…
speculative: fix MTP draft crash on vision inputs
#25144
opened Jun 29, 2026 by
ServeurpersoCom
Contributor
Loading…
ui: strip path and weight extension from model id in single model mode
server/ui
#25137
opened Jun 29, 2026 by
ServeurpersoCom
Contributor
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:master.