ggml-cpu: add repack for mxfp4 (llama/19738)
8f3aa2fd
CUDA: add CDNA3 MFMA support for flash attention MMA kernel (llama/19…
50646889
cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernel…
681debff
vulkan: improve partial offloading performance on AMD (llama/19976)
91e9af83
ggml-cpu: optimise s390x multiply extend instructions (llama/20032)
1810a80b
vulkan: tune MMVQ for Intel Windows (llama/19988)
47b12eae
ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1…
84e45a60
ggml webgpu: Clean up per-thread parameter buffer pool and job submis…
8be81d37
ggml webgpu: fix workgroup dispatch limit for large batch sizes (llam…
a444c8a0
opencl: add optimized q4_1 mm kernel for adreno (llama/19840)
d89fc23b
kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (llama/…
2eeb5e3a
ggml : use a simple std::thread in AMX without OpenMP (llama/20074)
1a9b0f9f
ggml: fix ggml_is_contiguous_n for ne == 1 (llama/20092)
ffe593bb
Add concat op to webgpu. (llama/20068)
c456e26e
Fix wait logic for inflight jobs (llama/20096)
6ae853c3
opencl: add `SET`, support i32 for `CPY`, minor refactor for cpy (lla…
38cc52c1
hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and M…
0964d663
chore : correct typos [no ci] (llama/20041)
484163ec
CUDA: Improve performance via less synchronizations between token (ll…
22502c04
hexagon: add fp16 support for binary ops: add,sub,mul,div (llama/20139)
15ff3f5b
opencl: add neg, exp and diag (llama/20127)
dc6b7229
ggml-cpu: fix data race for debug asserts (llama/20148)
a27e51cb
CUDA: use shared mem for ssm_conv (llama/20128)
cbd9e948
ggml-cpu: Fix gcc 15 ICE on ppc64le (ggml/20083) (llama/20130)
96fb6151
ggml: update comments for backends which have no memory to report (ll…
7f38f2a2
ggml-cuda: add mem check for fusion (llama/19916)
6e34b302
cpu: skip redudant ROPE cache updates (llama/20149)
9bfd03b9
hexagon: add f32 ssm_conv op (llama/20122)
d5ea0591
quants : Add memsets and other fixes for IQ quants (llama/19861)
94de6807
opencl: add l2_norm (llama/20160)
78531aa0
ggml: add GATED_DELTA_NET op (llama/19504)
bc20d1ab
supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (llama/20190)
b61009bb
vulkan: Fix data races in coopmat1 mul_mat(_id) (llama/20084)
df65a360
ggml-vulkan: Add ELU op support (llama/20183)
18a8e3d5
cuda : display total and free VRAM capacity during device initializat…
47ba98e1
vulkan: skip zero size tensors in backend copies (llama/20233)
e8501ded
ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (l…
f328ac96
ggml-cuda: disable gdn for musa (llama/20278)
e307d934
metal : add upscale (llama/20284)
dac5a06a
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama/20250)
6a1de6f8
metal: handle command buffer failures gracefully in synchronize (llam…
d2a96485
ggml-cpu: add RVV repack GEMM and GEMV for quantization types (llama/…
ef15437e
kleidiai : support for concurrent sme and neon kernel execution (llam…
e155c541
ggml webgpu: faster normal quant and some k-quant matrix operations, …
bf2d45a6
ggml : bump RPC version (llama/20330)
f1265f13
fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (llam…
ec414e9d
fix op rope, add rope_back (llama/20293)
e3db9561
cuda/hip: fix loop unrolling in ssm-conv (llama/20369)
ff5fa922
ggml-cuda: gdn use shared mem for HIP (llama/20366)
42df97a1
metal : add env var to trigger graph capture (llama/20398)
80655c9a
metal : fix q5_k mul_mv register spill (llama/20399)
dafed294
metal : fix capture_compute counter logic (llama/20410)
ed7718a3
llama : add support for Nemotron 3 Super (llama/20411)
e19ec84c
ggml : add NVFP4 quantization type support (llama/19769)
e56a3c19
llama : enable chunked fused GDN path (llama/20340)
9f32749f
ggml-webgpu: Add supports for `GGML_OP_REPEAT` (llama/20230)
0ec906b9
hip: compile debug builds with -O2 on hip to avoid a compiler bug (ll…
20e71614
opencl: add cumsum op (llama/18981)
c8fc662f
opencl: use larger workgroup size for get_rows (llama/20316)
b14f9f58
vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large mode…
00917d94
vulkan: fix OOB check in flash_attn_mask_opt (llama/20296)
812c45ee
vulkan: fix l2_norm epsilon handling (llama/20350)
8ca1798d
sync : ggml
570e1466
metal : avoid divisions in bin kernel (llama/20426)
e18c6d28
sync : ggml
606ee599
vulkan: fix SSM_CONV PP scaling with large ubatch sizes (llama/20379)
e057ff00
vulkan: add GATED_DELTA_NET op support (llama/20334)
f25cbd60
llama : disable graph reuse with pipeline parallelism (llama/20463)
f5025aec
metal : fix l2 norm scale (llama/20493)
8a33e87a
ggml : fix typo gmml (llama/20512)
7f44d6b7
ggml-cpu: add RVV vec dot kernels for quantization types (llama/18859)
5022e7eb
graph : remove redundant GDN state transposes (llama/20443)
97854bea
opencl: fix l2_norm (llama/20480)
deaa2db4
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT ope…
bdc9bac0
ggml : add OpenVINO backend (llama/15307)
7a14c004
Use fp32 in cuBLAS V100 to avoid overflows, env variables to override…
cfa02a01
ggml : add native AVX512-FP16 support for F16 operations (llama/20529)
5389551b
add op gated_delta_net (llama/20455)
b3106ec5
hexagon: Q4_0 and MXFP4 repack fixes (llama/20527)
21773e33
metal : add FA specialization for HSK = 320, HSV = 256 (llama/20549)
d752d737
vulkan: use graphics queue on AMD (llama/20551)
c43c45c5
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (llama…
f9cd8348
ggml : guard against sumq2 being 0 in IQ4_NL (llama/20460)
25a75674
ggml/hip: fix APU compatibility - soft error handling for hipMemAdvis…
603bc50d
ggml: avoid creating CUDA context during device init (llama/20595)
d79032c9
CUDA: limit number of FA stream-k CUDA blocks (llama/20586)
491d5129
common : add nvfp4 (ggml/0)
4ebc6f58
ggml : extend im2col f16 (ggml/1434)
83fed291
sync : ggml
2f41a39b
talk-llama : sync llama.cpp
678c2b42
danbev
approved these changes
on 2026-03-16
ggml : try fix arm build (#0)
ae853f46
ggerganov
merged
27fa2077
into master 73 days ago
ggerganov
deleted the sync-ggml-26-03-16 branch 73 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub