PR #3867 sync : ggml - SemanticDiff

model : support for DeepseekV32ForCausalLM with generic DeepSeek Spar…

27c26a18

CUDA: Check PTX version on host side to guard PDL dispatch (llama/23530)

a153f5b7

ggml-webgpu: add q4_0/q8_0 SET_ROWS (llama/23760)

a3ed28a0

ggml-webgpu: Check earlier for WebGPU required features (llama/23879)

8648232d

vulkan: add Flash Attention support for BFloat16 KV cache (llama/23420)

6f292442

ggml : add some lsx support (llama/23798)

976cc280

metal : restore im2col implementation for large kernels (llama/23901)

af46d7dd

opencl: support bf16 by converting to f16 (llama/23839)

d489167f

sycl : Optimize Q3_K mul_mat by reorder (llama/23725)

c407806f

Add more types in GET_ROWS OP (llama/23710)

1e323ff6

Support Q4_1, Q5_0, Q5_1 in Flash-attention (llama/23812)

2c6f70df

vulkan: Removed unused functions (llama/23175)

db67a96d

vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (lla…

2e1c642f

TP: quantized KV cache support (llama/23792)

882a7f05

vulkan: reduce host memory lock contention (llama/23376)

9ce1e488

vulkan: don't hold the device mutex while compiling pipelines (llama/…

7d65f6d0

metal: template GLU kernels to support f16/f32 (llama/23882)

422b2a52

opencl: add basic support for q5_0 and q5_1 (llama/23548)

fef47bc1

revert to using global_invocation_id for cpy shader (llama/23955)

1eed094c

opencl: fix compiler warnings for non-adreno path (llama/23922)

0171f142

clean up unused variables warnings (llama/23975)

662c59ee

hexagon: add gelu_quick (llama/24007)

68fd61e9

hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimiza…

dec39166

hexagon: profiler output fix and script updates (llama/24042)

200515c3

opencl: use flat variants of q4_K and q6_K gemv for very large M (lla…

5048c5c6

cuda: reserve space for quantize kv-cache at startup (llama/23907)

0b81d366

ggml-cpu: use runtime SVE width in FWHT (llama/24059)

585ecd31

Avoid PDL race conditions by disabling __restrict__ when PDL is used …

6b5114cf

ggml-cpu: extend RVV quantization vec dot to higher VLENs (llama/22754)

44d6bc8b

ggml-webgpu: FlashAttention refactor + standardize quantization suppo…

e4af133e

metal : reduce rset heartbeat from 500ms -> 5ms (llama/24074)

9e632475

ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209)

46cf4a33

sycl : port multi-column MMVQ from CUDA backend (llama/21845)

edf2bd08

CUDA: enroll mul_mat_vec_q_moe into pdl (llama/24087)

656861e6

kleidiai : dynamic chunck-based scheduling for hybrid execution (llam…

0dacd866

vulkan: add fwht support for Intel with shmem reduction (llama/23964)

4f05b2f6

opencl: improve get_rows, cpy, concat and q6_k flat gemv (llama/24160)

ccb78de2

vulkan: check coopmat2 features before reporting support (llama/24186)

c016904b

metal : fix im2col 1D case (audio models) (llama/24220)

98b7f6eb

HIP: add gfx1152 and gfx1153 to RDNA3.5 (llama/24129)

abb7a0a0

sync : ggml

c5458078

ggml : bump version to 0.14.0 (ggml/1533)

38f9ea3a

sync : ggml

126f0b0b

talk-llama : sync llama.cpp

eb292f72

danbev approved these changes on 2026-06-08

ggerganov merged 84bd03a4 into master 10 days ago

ggerganov deleted the sync-ggml-26-06-08 branch 10 days ago

whisper.cpp
sync : ggml
#3867

Merged

sync : ggml #3867

whisper.cpp sync : ggml #3867 Merged

sync : ggml #3867

whisper.cpp
sync : ggml
#3867

Merged