CUDA: faster tile FA, add oob checks, more HSs (llama/16492)
791e60a6
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (llama/16…
7847625f
ggml : Fix FP16 ELU positive branch (llama/16519)
45e26a5f
fix UT fault cases: count-equal, argsort, pad OPs (llama/16521)
99d07411
metal : add opt_step_adamw and op_sum (llama/16529)
1a08f1d9
CANN: Update several operators to support FP16 data format (llama/16251)
f9de4e2a
ggml : fix scalar path for computing norm (llama/16558)
6dd86087
metal: add support for opt_step_sgd (llama/16539)
5ef11175
CANN: fix CPU memory leak in CANN backend (llama/16549)
b7c7d0c7
ggml : fix build broken with -march=armv9-a on MacOS (llama/16520)
7ce6c536
CUDA: fix numerical issues in tile FA kernel (llama/16540)
e2b9c209
opencl: fix build targeting CL 2 (llama/16554)
6839554b
metal : FA support F32 K and V and head size = 32 (llama/16531)
d98a1645
cuda : remove legacy copy-op pointer indirection code (llama/16485)
d541d242
CUDA: add fp kernel for larger batch size MoE (llama/16512)
17d67cab
CUDA: use fastdiv + ggml_cuda_mad for mmvf (llama/16557)
360acc78
CUDA: enable FA for FP32 KV cache (llama/16546)
395008b4
vulkan: Improve build time for MSVC (llama/16545)
0f82a3c5
vulkan: Support FA with K/V in F32 (llama/16543)
c03c4348
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …
296f8c0b
vulkan: Add ACC_TYPE_VEC2 implementation (llama/16203)
84521445
sync : ggml
c5d5a808
talk-llama : sync llama.cpp
2eb25b13
danbev
approved these changes
on 2025-10-15
ggerganov
merged
8ba3c13b
into master 156 days ago
ggerganov
deleted the sync-ggml-25-10-14 branch 156 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub