sync : ggml #3148

ggerganov merged 21 commits into master from sync-ggml-25-05-13
ggerganov
sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama…
0c4a2291
jeffbolznv vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326)
19d8d9a9
rgerganov rpc : add rpc_msg_set_tensor_hash_req (llama/13353)
00c80567
JohannesGaessler CUDA: fix crash on large batch size for MoE models (llama/13384)
f8c75dc4
JohannesGaessler CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
aef59f48
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12…
b493e03b
jeffbolznv vulkan: scalar flash attention implementation (llama/13324)
22f4997d
JohannesGaessler CUDA: fix FlashAttention on Turing (llama/13415)
04445664
JohannesGaessler CUDA: fix race conditions FlashAttention kernels (llama/13438)
86dece9c
hjc4869 Add `--no-op-offload` to improve `-ot` pp perf in MoE models like lla…
0b1962a1
JohannesGaessler CUDA: fix crash with partial offloading of MoE (llama/13439)
c4268297
AD2605 enable dpcpp nightly builds with libraries (llama/13406)
882d9757
JohannesGaessler CUDA: fix misaligned synchronization in FA (llama/13469)
8264872b
eddnjjn ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053)
cb90cb09
JohannesGaessler llama/ggml: add LLM training support (llama/10544)
fe0d52b9
lhez opencl: remove unnecessary assert for `add` (llama/13257)
43a59ecc
ggerganov metal : optimize MoE for large batches (llama/13388)
926e06db
ngxson ggml : add mrope kernel for metal (llama/13457)
79fb43e2
ggerganov sync : ggml
89970b9a
ggerganov whisper : update to ggml-backend changes (#0)
69753804
ggerganov talk-llama : sync llama.cpp
bff8dc24
danbev
danbev approved these changes on 2025-05-13
ggerganov ggerganov merged f8905605 into master 259 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone