sync : ggml #3867

ggerganov merged 44 commits into master from sync-ggml-26-06-08
ggerganov
fairydreaming model : support for DeepseekV32ForCausalLM with generic DeepSeek Spar…
27c26a18
ORippler CUDA: Check PTX version on host side to guard PDL dispatch (llama/23530)
a153f5b7
reeselevine ggml-webgpu: add q4_0/q8_0 SET_ROWS (llama/23760)
a3ed28a0
reeselevine ggml-webgpu: Check earlier for WebGPU required features (llama/23879)
8648232d
0cc4m vulkan: add Flash Attention support for BFloat16 KV cache (llama/23420)
6f292442
MQ-mengqing ggml : add some lsx support (llama/23798)
976cc280
ggerganov metal : restore im2col implementation for large kernels (llama/23901)
af46d7dd
lhez opencl: support bf16 by converting to f16 (llama/23839)
d489167f
arthw sycl : Optimize Q3_K mul_mat by reorder (llama/23725)
c407806f
arthw Add more types in GET_ROWS OP (llama/23710)
1e323ff6
arthw Support Q4_1, Q5_0, Q5_1 in Flash-attention (llama/23812)
2c6f70df
winstonma vulkan: Removed unused functions (llama/23175)
db67a96d
TheBlueMatt vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (lla…
2e1c642f
JohannesGaessler TP: quantized KV cache support (llama/23792)
882a7f05
winstonma vulkan: reduce host memory lock contention (llama/23376)
9ce1e488
jeffbolznv vulkan: don't hold the device mutex while compiling pipelines (llama/…
7d65f6d0
shrivasshankar metal: template GLU kernels to support f16/f32 (llama/23882)
422b2a52
shaofeiqi opencl: add basic support for q5_0 and q5_1 (llama/23548)
fef47bc1
yomaytk revert to using global_invocation_id for cpy shader (llama/23955)
1eed094c
lhez opencl: fix compiler warnings for non-adreno path (llama/23922)
0171f142
anavp-nvidia clean up unused variables warnings (llama/23975)
662c59ee
tboinovski1 hexagon: add gelu_quick (llama/24007)
68fd61e9
max-krasnyansky hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimiza…
dec39166
max-krasnyansky hexagon: profiler output fix and script updates (llama/24042)
200515c3
lhez opencl: use flat variants of q4_K and q6_K gemv for very large M (lla…
5048c5c6
am17an cuda: reserve space for quantize kv-cache at startup (llama/23907)
0b81d366
chaxu01 ggml-cpu: use runtime SVE width in FWHT (llama/24059)
585ecd31
aendk Avoid PDL race conditions by disabling __restrict__ when PDL is used …
6b5114cf
rehan-10xengineer ggml-cpu: extend RVV quantization vec dot to higher VLENs (llama/22754)
44d6bc8b
reeselevine ggml-webgpu: FlashAttention refactor + standardize quantization suppo…
e4af133e
ggerganov metal : reduce rset heartbeat from 500ms -> 5ms (llama/24074)
9e632475
sirohikartik ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209)
46cf4a33
masonmilby sycl : port multi-column MMVQ from CUDA backend (llama/21845)
edf2bd08
ORippler CUDA: enroll mul_mat_vec_q_moe into pdl (llama/24087)
656861e6
chaxu01 kleidiai : dynamic chunck-based scheduling for hybrid execution (llam…
0dacd866
0cc4m vulkan: add fwht support for Intel with shmem reduction (llama/23964)
4f05b2f6
lhez opencl: improve get_rows, cpy, concat and q6_k flat gemv (llama/24160)
ccb78de2
0cc4m vulkan: check coopmat2 features before reporting support (llama/24186)
c016904b
ngxson metal : fix im2col 1D case (audio models) (llama/24220)
98b7f6eb
harkgill-amd HIP: add gfx1152 and gfx1153 to RDNA3.5 (llama/24129)
abb7a0a0
ggerganov sync : ggml
c5458078
ggerganov ggml : bump version to 0.14.0 (ggml/1533)
38f9ea3a
ggerganov sync : ggml
126f0b0b
ggerganov talk-llama : sync llama.cpp
eb292f72
danbev
danbev approved these changes on 2026-06-08
ggerganov ggerganov merged 84bd03a4 into master 10 days ago
ggerganov ggerganov deleted the sync-ggml-26-06-08 branch 10 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone