PR #3652 sync : ggml - SemanticDiff

sync : ggml #3652

ggerganov merged 40 commits into master from sync-ggml-26-02-07

cmake : remove unused file (ggml/1419)

f5a8ff69

ggml : bump version to 0.9.6 (ggml/1423)

9e51bff5

Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (…

4232eb45

opencl: add optimized q8_0 mm kernel for adreno (llama/18871)

3b0dc18b

ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)

39e8ccb4

Bump cmake max version (needed for Windows on Snapdragon builds) (lla…

daa79d47

Remove pipeline cache mutexes (llama/19195)

ea25dc89

docs : Minor cleanups (llama/19252)

1d482034

ggml-backend: fix async set/get fallback sync (llama/19179)

a14f9b35

metal : support virtual devices (llama/18919)

d3d73702

sycl: implement GGML_OP_TOP_K (llama/19242)

00f1eede

Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nv…

80a022a2

ggml-cpu: FA split across kv for faster TG (llama/19209)

a9417b65

opencl: refactor some ops, concat, repeat, tanh and scale (llama/19226)

eb112b0a

cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (l…

88544fd3

ggml: added cleanups in ggml_quantize_free (llama/19278)

f28a02c2

CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (llama/19…

52476c9b

metal : minor cleanup (llama/19251)

8adee284

CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)

49da98a2

vulkan: disable coopmat1 fa on Nvidia Turing (llama/19290)

12c470f8

metal : add solve_tri (llama/19302)

6e83e87f

ggml-cpu: use LUT for converting e8->f32 scales on x86 (llama/19288)

569b39f2

ggml-virtgpu: make the code thread safe (llama/19204)

e9d35f70

metal : add missing includes (llama/19348)

ce06ddb9

vulkan: fix non-contig rope (llama/19299)

a51e5f80

vulkan: Set k_load_shmem to false when K is too large (llama/19301)

c109f728

vulkan: fix GPU deduplication logic. (llama/19222)

a0f98202

metal : add diag (llama/19330)

f709bc53

vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama…

c1bed7ec

metal : adaptive CPU/GPU interleave based on number of nodes (llama/1…

4d1e0268

cuda : cuda graphs now compare all node params (llama/19383)

33c0bccf

metal : skip loading all-zero mask (llama/19337)

9af583dd

vulkan: make FA mask/softcap enables spec constants (llama/19309)

6d777d61

vulkan: For coopmat2 FA, use fp16 accumulators for the final result (…

80c643a4

sycl: add F16 support for GGML_OP_CEIL (llama/19306)

57960038

ggml-webgpu: JIT compile binary operators and handle binding overlaps…

f8d7a7f4

metal : fix event synchronization in cpy_tensor_async (llama/19402)

22f2851f

metal : consolidate bin kernels (llama/19390)

0a356e0c

sync : ggml

a5a33c0a

talk-llama : sync llama.cpp

a4720dba

danbev approved these changes on 2026-02-07

ggerganov merged 4b23ff24 into master 24 days ago

ggerganov deleted the sync-ggml-26-02-07 branch 24 days ago

Reviewers

danbev

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3652 Merged

sync : ggml #3652

whisper.cpp
sync : ggml
#3652

Merged