sync : ggml #3652

ggerganov merged 40 commits into master from sync-ggml-26-02-07
ggerganov
ggerganov cmake : remove unused file (ggml/1419)
f5a8ff69
ggerganov ggml : bump version to 0.9.6 (ggml/1423)
9e51bff5
sredman Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (…
4232eb45
shaofeiqi opencl: add optimized q8_0 mm kernel for adreno (llama/18871)
3b0dc18b
chraac ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)
39e8ccb4
max-krasnyansky Bump cmake max version (needed for Windows on Snapdragon builds) (lla…
daa79d47
nikhilJain17 Remove pipeline cache mutexes (llama/19195)
ea25dc89
ckastner docs : Minor cleanups (llama/19252)
1d482034
JohannesGaessler ggml-backend: fix async set/get fallback sync (llama/19179)
a14f9b35
ggerganov metal : support virtual devices (llama/18919)
d3d73702
tdevelope sycl: implement GGML_OP_TOP_K (llama/19242)
00f1eede
arthw Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nv…
80a022a2
am17an ggml-cpu: FA split across kv for faster TG (llama/19209)
a9417b65
lhez opencl: refactor some ops, concat, repeat, tanh and scale (llama/19226)
eb112b0a
gaugarg-nv cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (l…
88544fd3
noctrex ggml: added cleanups in ggml_quantize_free (llama/19278)
f28a02c2
ORippler CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (llama/19…
52476c9b
ggerganov metal : minor cleanup (llama/19251)
8adee284
am17an CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
49da98a2
0cc4m vulkan: disable coopmat1 fa on Nvidia Turing (llama/19290)
12c470f8
ggerganov metal : add solve_tri (llama/19302)
6e83e87f
am17an ggml-cpu: use LUT for converting e8->f32 scales on x86 (llama/19288)
569b39f2
kpouget ggml-virtgpu: make the code thread safe (llama/19204)
e9d35f70
will-lms metal : add missing includes (llama/19348)
ce06ddb9
jeffbolznv vulkan: fix non-contig rope (llama/19299)
a51e5f80
jeffbolznv vulkan: Set k_load_shmem to false when K is too large (llama/19301)
c109f728
okuvshynov vulkan: fix GPU deduplication logic. (llama/19222)
a0f98202
ggerganov metal : add diag (llama/19330)
f709bc53
jeffbolznv vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama…
c1bed7ec
ggerganov metal : adaptive CPU/GPU interleave based on number of nodes (llama/1…
4d1e0268
ggerganov cuda : cuda graphs now compare all node params (llama/19383)
33c0bccf
ggerganov metal : skip loading all-zero mask (llama/19337)
9af583dd
jeffbolznv vulkan: make FA mask/softcap enables spec constants (llama/19309)
6d777d61
jeffbolznv vulkan: For coopmat2 FA, use fp16 accumulators for the final result (…
80c643a4
NechamaKrashinski sycl: add F16 support for GGML_OP_CEIL (llama/19306)
57960038
abhijitramesh ggml-webgpu: JIT compile binary operators and handle binding overlaps…
f8d7a7f4
ggerganov metal : fix event synchronization in cpy_tensor_async (llama/19402)
22f2851f
ggerganov metal : consolidate bin kernels (llama/19390)
0a356e0c
ggerganov sync : ggml
a5a33c0a
ggerganov talk-llama : sync llama.cpp
a4720dba
danbev
danbev approved these changes on 2026-02-07
ggerganov ggerganov merged 4b23ff24 into master 24 days ago
ggerganov ggerganov deleted the sync-ggml-26-02-07 branch 24 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone