sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (llama/21283)
df72261d
ggml-webgpu: add vectorized flash attention (llama/20709)
e68c2916
rpc : reuse compute graph buffers (llama/21299)
f24ad1e4
ggml-zendnn : add MUL_MAT_ID op support for MoE models (llama/21315)
c483ff67
ggml-webgpu: move from parameter buffer pool to single buffer with of…
44dc697a
hexagon: slight optimization for argosrt output init (llama/21463)
799db709
sycl : handle other FA case (llama/21377)
88b4f515
Write an optimized flash_attn_stream_k_fixup kernel (llama/21159)
efd3b513
ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)
b8621039
ggml-webgpu: Add the support of `MUL_MAT_ID` (llama/21147)
2f4c4462
Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21…
983da6a6
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) …
3f81977b
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (llama/21029)
dc2f14ad
ggml: Vulkan build, Linux -- output error string for errno on fork fa…
7ee5f7c9
ggml : deprecate GGML_OP_ADD1 (llama/21363)
6bf0eec7
CUDA: check for buffer overlap before fusing (llama/21566)
f2a19d4a
ggml-webgpu: parameterize submission size and add iOS specific limits…
fa761cd3
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (llama/21168)
ac52cf58
CUDA: make cuda graphs props check faster (llama/21472)
c3bf23a2
metal: Q1_0 backend (llama/21528)
4969090e
webgpu : Query for adapter support when registering WebGPU backend (l…
07dc4a68
fix: free ctx_copy in ggml_opt_free to plug per-training-session leak…
c9e99715
CUDA: also store `node->src->data` ptrs for equality check (llama/21635)
607fbdfc
vulkan: unify type macros to use Vx instead of _VECx (llama/21605)
0fe04584
sycl : add flash-attn support for head size 512 (llama/21654)
ccb3f958
metal : add missing mm-id specializations for q1_0 (llama/21662)
677816c5
sync : llama.cpp
5c1d8e02
danbev
approved these changes
on 2026-04-09
ggerganov
merged
58c38058
into master 81 days ago
ggerganov
deleted the sync-llama.cpp-26-04-09 branch 81 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub