PR #1160 sync : llama.cpp

ggml : skip intermediate .air file when compiling .metallib (llama/12…

975e013e

ggml-backend : make path_str compatible with C++20 (llama/12269)

6f1482f0

tests : fix test-quantize-fns to init the CPU backend (llama/12306)

3b9ab121

opencl: use OpenCL C standard supported by the device (llama/12221)

bd498fb4

musa: support new arch mp_31 and update doc (llama/12296)

f5489240

mat vec double buffer (llama/12188)

0a9761e9

metal : Cache the Metal library at the device context level (llama/12…

9091eeae

ggml-backend : fix backend search path (llama/12330)

60bd86a0

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …

51e47802

vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)

fbedb178

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

505422f2

sycl : variable sg_size support for mmvq kernels (llama/12336)

ab5b0d11

MUL_MAT optimization (llama/12382)

eb84db8f

SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (…

96c5d142

SYCL: Delete redundant plus sign and space (llama/12391)

afbf61d5

SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)

1c8153d1

cmake : enable building llama.cpp using system libggml (llama/12321)

f6bf093d

vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)

5c70888c

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

64386ff0

vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)

a373dd2b

vulkan: subgroup size tuning (llama/12087)

2662d5da

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)

26c8697c

ggml-vulkan: remove unused find_program(glslc) (llama/12416)

915c4f8a

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

5aef5af2

llama: Add support for RWKV v7 architecture (llama/12412)

f8d81ea6

fixed compilation warnings in ggml-sycl (llama/12424)

7018b32b

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…

3eb60975

ggml : add SVE support for q6_K_q8_K (llama/12361)

6a168b10

SYCL: using graphs is configurable by environment variable and compil…

245b76f4

musa: override warp_size of musa device to 32 (llama/12445)

2a94810e

opencl: improve profiling (llama/12442)

0a05a500

vulkan: Submit once enough matmul work has been recorded (llama/12406)

9f34a512

Fix visionOS build and add CI (llama/12415)

cb26fbfb

vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)

39748c11

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llam…

2a02e67a

ggml : block interleaving support for Q4_K quantization for x86 AVX2 …

bb1d5da2

sycl: cleanup oneDNN related code (llama/12097)

e8d18a49

Fix build on Windows when ccache enabled (#9954) (llama/9976)

61b74517

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/1…

d44565f6

Vulkan: RTE rounding for cpy to quant (llama/12480)

f7be1eb5

vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)

60beda8c

musa: refine compute capability (llama/12493)

d385fc7f

ggml : fix quantized cpy op (llama/12310)

80ce83df

vulkan: fix mul_mat_vec failure in backend tests (llama/12529)

51937836

CUDA: Fix clang warnings (llama/12540)

3b5918f2

opencl: simplify kernel embedding logic in cmakefile (llama/12503)

935dab3a

SYCL: disable Q4_0 reorder optimization (llama/12560)

e52a9edf

ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)

25177579

ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)

1f4255ce

metal : refactor mat-vec code (llama/12569)

6f57c7b0

HIP: Add support for RDNA4 targets (llama/12372)

44fd23e5

SYCL: implement memset ggml backend buffer interface (llama/12580)

5bc40006

llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

ff66af54

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (#0)

89205af3

sync : llama.cpp

4c4f07ab

files : remove old wkv6 sources (#0)

c838c22e

ggerganov merged 660def06 into master 1 year ago

ggerganov deleted the sync-llama.cpp-25-03-27 branch 1 year ago

ggml
sync : llama.cpp
#1160

Merged

sync : llama.cpp #1160

ggml sync : llama.cpp #1160 Merged

sync : llama.cpp #1160

ggml
sync : llama.cpp
#1160

Merged