CUDA: Fix non-contig rope (llama/19338)
b9c178d4
cuda : extend GGML_OP_PAD to work with non-cont src0 (llama/19429)
0cbf122c
CANN: implement quantized MUL_MAT_ID for MoE models (llama/19228)
fd5cfe14
CANN: Remove unnecessary wrapper for `gml_backend_buft_is_cann` (llam…
30b52c6a
ggml : use noexcept overload for is_regular_file in backend registrat…
2f2b2f5d
ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementati…
0d0bf3a9
Plug memory leaks and free resources on shutdown (llama/19315)
3415a2e4
CUDA : Update CCCL-tag for 3.2 to final release from RC (llama/19486)
99418ad3
metal : consolidate unary ops (llama/19490)
be449af3
ggml : extend bin bcast for permuted src1 (llama/19484)
012bd607
hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (llama/19406)
b26b2b19
metal : extend l2_norm support for non-cont src0 (llama/19502)
81aa77c3
ggml : unary ops support non-cont src0 + metal F16 unary ops (llama/1…
6f46caea
opencl: add general Q6_K mm and Q4_K mv (llama/19347)
bf58e6f3
hexagon: further optimization and tuning of matmul and dot kernels (l…
faa42605
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/1…
771827e8
metal : update sum_rows kernel to support float4 (llama/19524)
91bbbdf6
opencl: add basic support for q4_1 (llama/19534)
a0bcad8d
hexagon: fix typo in vtcm_needs_release (llama/19545)
8e43b5a5
metal : support GGML_OP_SET (llama/19548)
4ed6faf2
metal : improve concurrency (llama/19555)
31c389d0
CUDA: Do not mutate cgraph for fused ADDs (llama/19566)
76896039
CUDA: loop over ne2*ne3 in case it overflows (llama/19538)
9bdef053
fix vulkan ggml_acc only works in 3d but not 4d (llama/19426)
8f9ca9ce
Fix wrong memcpy length for block_interleave == 4 (llama/19575)
0ae3e1df
vulkan: restore -inf check in FA shaders (llama/19582)
fd765784
hexagon: further optimizations and refactoring for flash attention (l…
cacd47af
vulkan: Add vendor id for Qualcomm drivers (llama/19569)
73d40946
vulkan: support GGML_OP_SET (llama/19584)
b915a235
vulkan: support L2_NORM with contiguous rows (llama/19604)
c2655cfe
metal : fix ACC op (llama/19427)
0d9def9f
ggml : fix GGML_DEBUG with OpenMP (llama/19599)
4a206c84
models : optimize qwen3next graph (llama/19375)
07db21f8
sync : ggml
5a11c727
talk-llama : sync llama.cpp
3413099c
ggerganov
merged
364c77f4
into master 19 days ago
ggerganov
deleted the sync-ggml-26-02-16 branch 19 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub