PR #1311 sync : llama.cpp

sync : llama.cpp #1311

ggerganov merged 23 commits into master from sync-llama.cpp-25-07-24

Vulkan: Fix fprintf format-security warning (llama/14770)

1d1866d7

vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#132…

c5262c2c

ggml: adds CONV_2D op and direct GEMM Vulkan implementation (llama/14…

d45d283a

vulkan/cuda: Fix im2col when KW!=KH (llama/14789)

27435226

kleidiai: add support for get_rows (llama/14676)

aa65fde7

sycl: Fix im2col (llama/14797)

de18e9a7

opencl: add conv2d kernel (llama/14403)

58f48321

opencl: fix `im2col` when `KW!=KH` (llama/14803)

64088bbc

cuda: remove linking to cublasLt (llama/14790)

5485663d

opencl: remove unreachable `return` (llama/14806)

5b97e8ed

cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763)

1dad821b

vulkan: fix rms_norm_mul to handle broadcasting dim0 (llama/14817)

6bc9c977

CUDA: add fused rms norm (llama/14800)

b06d9cbf

CANN: weight format to NZ for Ascend310P3 (llama/14407)

0ed1969e

ggml: fix loongarch quantize_row_q8_1 error (llama/14827)

a44689f0

tests : add non-cont K,V FA tests

d4951584

CUDA: fix quantized KV cache + multiple sequences (llama/14822)

e27e2cd6

CUDA: fix compilation with GGML_CUDA_F16 (llama/14837)

21c3ebd0

CUDA: fix overflow in FA, tune performance (llama/14840)

5d1cc399

sycl: fix undefined variable in work group size check (llama/14843)

1d54f61a

metal : fix fusion across different encoders (llama/14849)

e2829867

sycl: fixed semantics of block offset calculation (llama/14814)

8dcd3dc4

sync : llama.cpp

ee456e8d

danbev approved these changes on 2025-07-24

ggerganov merged ac842675 into master 161 days ago

ggerganov deleted the sync-llama.cpp-25-07-24 branch 161 days ago

Reviewers

danbev

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

ggml sync : llama.cpp #1311 Merged

sync : llama.cpp #1311

ggml
sync : llama.cpp
#1311

Merged