Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

server: add --no-sleep flag for GPU heartbeat on headless GPUs CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Vulkan Issues specific to the Vulkan backend
#25214 opened Jul 1, 2026 by johnkarlhill Loading…
rocm: fix mmap loading of large models CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning
#25212 opened Jul 1, 2026 by pwilkin Member Loading…
Optimize RWKV7 inference by fusing some graph operators Apple Metal https://en.wikipedia.org/wiki/Metal_(API) CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning model Model specific SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
#25206 opened Jul 1, 2026 by MollySophia Collaborator Draft
sycl: add GGML_SYCL_FATTN_VEC_NTHREADS build option ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25205 opened Jul 1, 2026 by Titaniumtown Loading…
llama: fix quantized kv-cache for dsv4 model Model specific
#25202 opened Jul 1, 2026 by am17an Contributor Loading…
ggml-cpu: Enable tiled matmul on AIX ggml changes relating to the ggml tensor library for machine learning
#25199 opened Jul 1, 2026 by shalinib-ibm Contributor Loading…
vulkan: disable async transfer queue on amdvlk (mitigate MoE partial-offload crash) ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#25196 opened Jul 1, 2026 by liminfei-amd Contributor Loading…
1 task done
vulkan: Remove crash guard for Intel GPU ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#25192 opened Jul 1, 2026 by rillomas Contributor Draft
openvino: fix SWA mask detection for long prompts ggml changes relating to the ggml tensor library for machine learning OpenVINO
#25189 opened Jul 1, 2026 by zlma7001 Loading…
CUDA/HIP: add Q2_0 (PrismML ternary 1.58-bit) support CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning
#25188 opened Jul 1, 2026 by The-Monk Draft
spec: add backend sampling for DFlash
#25180 opened Jun 30, 2026 by ruixiang63 Member Draft
tests: Source-level separation between llama.cpp and ggml testing Everything test related
#25179 opened Jun 30, 2026 by ckastner Collaborator Loading…
metal: add col2im_1d op (f32/f16/bf16) Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
#25176 opened Jun 30, 2026 by ServeurpersoCom Contributor Loading…
spec: add DSpark speculative decoding conversion model Model specific testing Everything test related
#25173 opened Jun 30, 2026 by wjinxu Loading…
grammar : recognize '|' at start of continuation line testing Everything test related
#25170 opened Jun 30, 2026 by o7si Contributor Loading…
hexagon: allow dflash lm-head offload experiment examples ggml changes relating to the ggml tensor library for machine learning Hexagon model Model specific
#25166 opened Jun 30, 2026 by Salanfeng Loading…
Add support for Laguna XS.2 & M.1 conversion CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning model Model specific testing Everything test related
#25165 opened Jun 30, 2026 by joerowell Loading…
ggml : fix wrong transpose function for int16 data ggml changes relating to the ggml tensor library for machine learning
#25161 opened Jun 30, 2026 by I3eg1nner Loading…
cuda: fix crash when querying memory on device with no free memory. CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning
#25157 opened Jun 30, 2026 by cphlipot Loading…
ggml: imatrix-aware NVFP4 quantization (scale search) + wire NVFP4 ftype examples ggml changes relating to the ggml tensor library for machine learning
#25153 opened Jun 30, 2026 by avifenesh Loading…
CUDA: add COL2IM_1D op CUDA Related to the CUDA backend documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning
#25151 opened Jun 29, 2026 by Ssamdeman Loading…
speculative: fix MTP draft crash on vision inputs
#25144 opened Jun 29, 2026 by ServeurpersoCom Contributor Loading…
ProTip! Filter pull requests by the default branch with base:master.