HIP: Refactor mma for RDNA and CDNA (llama/17990)
cde9758b
ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (llama/18096)
126deb39
ggml-hexagon: gelu operation (llama/17921)
2d9e9d54
ggml-hexagon: swiglu_oai operation (llama/18114)
013ebf19
remove i_major_dual (llama/18157)
b03ed79b
ggml-cpu: extend support for RVV floating-point kernels (llama/17318)
f9d36bd8
model : add ASR support for LFM2-Audio-1.5B (conformer) (llama/18106)
f09b84b0
vulkan: Add perf logger mode with concurrency (llama/17944)
baa5c1db
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for mor…
5a1735a7
Added comments explaining thread block size selection logic based on …
1dd0dfee
Vulkan: some improvement on mul_mat_iq2_xs (llama/18031)
216ddd8a
vulkan: in graph_optimize, try to group ADD operations (llama/18060)
3218970f
vulkan: support GGML_UNARY_OP_XIELU (llama/18062)
9a60d12d
vulkan/cuda: fix topk_moe with exp_probs_b (llama/18071)
5506ddb2
vulkan: fix im2col overflowing maxworkgroupcount (llama/18180)
e5d89abd
llama: fix RPC for -fit on (llama/18233)
092caa4c
vulkan: Implement set_tensor_async and the event interfaces (llama/18…
129f9631
vulkan: Extend rope fusions to allow mrope (llama/18264)
ad5a5115
opencl: unpack q4_0 for adreno in get_tensor (llama/18278)
ca5e155b
llamafile: add rvv support for sgemm kernels (llama/18199)
06b4ca96
ggml-hexagon: gelu optimization (llama/18151)
de7f933f
ggml-hexagon: create generalized functions for cpu side op (llama/17500)
81f95754
rpc : add check for rpc buffer type (llama/18242)
f5789f82
CANN: Uses yarn_ramp cache in ROPE (llama/17725)
56edad8e
vulkan: use fewer FA rows for small cache runs (llama/18280)
61872d57
CANN : refactor ACL graph cache (llama/17752)
de40463b
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (…
4137b226
CUDA: experimental native mxfp4 support for blackwell (llama/17906)
32cc57d3
ggml : optimize cuda cumsum fallback kernel (llama/18343)
17c42517
CANN: Add support for CONV_TRANSPOSE_1D when kernel size > 255 (llama…
a7b7da6f
ggml-cuda: fix blackwell native builds (llama/18361)
e1edaffb
cuda: optimize cumsum cub path (llama/18362)
dc1099c2
ggml-cuda: fix regex for arch list (llama/18371)
bc820e90
CANN: implement the SSM_CONV operator (llama/17737)
7c6e891b
vulkan: handle rope with large number of rows (llama/18306)
a1cc693c
vulkan: Support UPSCALE w/antialias (llama/18327)
09bd04f3
vulkan: small dequantization improvements (llama/18380)
b311303e
vulkan: Use BK=32 for coopmat2 mul_mat_id (llama/18332)
d78f3150
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (llama/18349)
f8dfe0ab
vulkan: preprocess mul_mat_id experts and discard workgroups more qui…
d63cb2af
ggml-cuda: Use same regex for GGML_NATIVE=OFF (llama/18407)
78b9d734
opencl: allow resizing transpose buffers (llama/18384)
6a9983bf
ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (l…
2971350e
cmake: Added more x86_64 CPU backends when building with `GGML_CPU_AL…
c5ef983c
rpc: fix segfault on invalid endpoint format (llama/18387)
76efccb1
Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATI…
d50bcb10
HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of s…
60f59a1b
cuda: fix race condition in cumsum (llama/18448)
dccb3be2
CUDA: Blackwell features for non-native builds (llama/18436)
4795bbf8
CUDA: fix replacment of bad archs in CMake (llama/18457)
a29fc211
CUDA: add log line when mxfp4 acceleration is used (llama/18483)
db760ebb
kleidiai: add and integrate SVE 256-bit vector-length kernel (llama/1…
5f634957
Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (llam…
46246528
sycl: add newline at the end of CMakeLists.txt (llama/18503)
a1780c7c
metal : remove BF16 x F16 kernels (llama/18456)
3aad8322
CUDA: fix KQ max calculation (llama/18487)
60add562
metal : add count_equal op (llama/18314)
94169eac
sync : ggml
3b318610
talk-llama : sync llama.cpp
9faea66e
danbev
approved these changes
on 2025-12-31
ggerganov
merged
7359ac94
into master 13 days ago
ggerganov
deleted the sync-ggml-25-12-31 branch 13 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub