cmake : remove unused file (ggml/1419)
f5a8ff69
ggml : bump version to 0.9.6 (ggml/1423)
9e51bff5
Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (…
4232eb45
opencl: add optimized q8_0 mm kernel for adreno (llama/18871)
3b0dc18b
ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)
39e8ccb4
Bump cmake max version (needed for Windows on Snapdragon builds) (lla…
daa79d47
Remove pipeline cache mutexes (llama/19195)
ea25dc89
docs : Minor cleanups (llama/19252)
1d482034
ggml-backend: fix async set/get fallback sync (llama/19179)
a14f9b35
metal : support virtual devices (llama/18919)
d3d73702
sycl: implement GGML_OP_TOP_K (llama/19242)
00f1eede
Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nv…
80a022a2
ggml-cpu: FA split across kv for faster TG (llama/19209)
a9417b65
opencl: refactor some ops, concat, repeat, tanh and scale (llama/19226)
eb112b0a
cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (l…
88544fd3
ggml: added cleanups in ggml_quantize_free (llama/19278)
f28a02c2
CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (llama/19…
52476c9b
metal : minor cleanup (llama/19251)
8adee284
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
49da98a2
vulkan: disable coopmat1 fa on Nvidia Turing (llama/19290)
12c470f8
metal : add solve_tri (llama/19302)
6e83e87f
ggml-cpu: use LUT for converting e8->f32 scales on x86 (llama/19288)
569b39f2
ggml-virtgpu: make the code thread safe (llama/19204)
e9d35f70
metal : add missing includes (llama/19348)
ce06ddb9
vulkan: fix non-contig rope (llama/19299)
a51e5f80
vulkan: Set k_load_shmem to false when K is too large (llama/19301)
c109f728
vulkan: fix GPU deduplication logic. (llama/19222)
a0f98202
metal : add diag (llama/19330)
f709bc53
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama…
c1bed7ec
metal : adaptive CPU/GPU interleave based on number of nodes (llama/1…
4d1e0268
cuda : cuda graphs now compare all node params (llama/19383)
33c0bccf
metal : skip loading all-zero mask (llama/19337)
9af583dd
vulkan: make FA mask/softcap enables spec constants (llama/19309)
6d777d61
vulkan: For coopmat2 FA, use fp16 accumulators for the final result (…
80c643a4
sycl: add F16 support for GGML_OP_CEIL (llama/19306)
57960038
ggml-webgpu: JIT compile binary operators and handle binding overlaps…
f8d7a7f4
metal : fix event synchronization in cpy_tensor_async (llama/19402)
22f2851f
metal : consolidate bin kernels (llama/19390)
0a356e0c
sync : ggml
a5a33c0a
talk-llama : sync llama.cpp
a4720dba
danbev
approved these changes
on 2026-02-07
ggerganov
merged
4b23ff24
into master 24 days ago
ggerganov
deleted the sync-ggml-26-02-07 branch 24 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub