model : support for DeepseekV32ForCausalLM with generic DeepSeek Spar…
27c26a18
CUDA: Check PTX version on host side to guard PDL dispatch (llama/23530)
a153f5b7
ggml-webgpu: add q4_0/q8_0 SET_ROWS (llama/23760)
a3ed28a0
ggml-webgpu: Check earlier for WebGPU required features (llama/23879)
8648232d
vulkan: add Flash Attention support for BFloat16 KV cache (llama/23420)
6f292442
ggml : add some lsx support (llama/23798)
976cc280
metal : restore im2col implementation for large kernels (llama/23901)
af46d7dd
opencl: support bf16 by converting to f16 (llama/23839)
d489167f
sycl : Optimize Q3_K mul_mat by reorder (llama/23725)
c407806f
Add more types in GET_ROWS OP (llama/23710)
1e323ff6
Support Q4_1, Q5_0, Q5_1 in Flash-attention (llama/23812)
2c6f70df
vulkan: Removed unused functions (llama/23175)
db67a96d
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (lla…
2e1c642f
TP: quantized KV cache support (llama/23792)
882a7f05
vulkan: reduce host memory lock contention (llama/23376)
9ce1e488
vulkan: don't hold the device mutex while compiling pipelines (llama/…
7d65f6d0
metal: template GLU kernels to support f16/f32 (llama/23882)
422b2a52
opencl: add basic support for q5_0 and q5_1 (llama/23548)
fef47bc1
revert to using global_invocation_id for cpy shader (llama/23955)
1eed094c
opencl: fix compiler warnings for non-adreno path (llama/23922)
0171f142
clean up unused variables warnings (llama/23975)
662c59ee
hexagon: add gelu_quick (llama/24007)
68fd61e9
hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimiza…
dec39166
hexagon: profiler output fix and script updates (llama/24042)
200515c3
opencl: use flat variants of q4_K and q6_K gemv for very large M (lla…
5048c5c6
cuda: reserve space for quantize kv-cache at startup (llama/23907)
0b81d366
ggml-cpu: use runtime SVE width in FWHT (llama/24059)
585ecd31
Avoid PDL race conditions by disabling __restrict__ when PDL is used …
6b5114cf
ggml-cpu: extend RVV quantization vec dot to higher VLENs (llama/22754)
44d6bc8b
ggml-webgpu: FlashAttention refactor + standardize quantization suppo…
e4af133e
metal : reduce rset heartbeat from 500ms -> 5ms (llama/24074)
9e632475
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209)
46cf4a33
sycl : port multi-column MMVQ from CUDA backend (llama/21845)
edf2bd08
CUDA: enroll mul_mat_vec_q_moe into pdl (llama/24087)
656861e6
kleidiai : dynamic chunck-based scheduling for hybrid execution (llam…
0dacd866
vulkan: add fwht support for Intel with shmem reduction (llama/23964)
4f05b2f6
opencl: improve get_rows, cpy, concat and q6_k flat gemv (llama/24160)
ccb78de2
vulkan: check coopmat2 features before reporting support (llama/24186)
c016904b
metal : fix im2col 1D case (audio models) (llama/24220)
98b7f6eb
HIP: add gfx1152 and gfx1153 to RDNA3.5 (llama/24129)
abb7a0a0
sync : ggml
c5458078
ggml : bump version to 0.14.0 (ggml/1533)
38f9ea3a
sync : ggml
126f0b0b
talk-llama : sync llama.cpp
eb292f72
danbev
approved these changes
on 2026-06-08
ggerganov
merged
84bd03a4
into master 10 days ago
ggerganov
deleted the sync-ggml-26-06-08 branch 10 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub