ggml
sync : llama.cpp
#1160
Merged

sync : llama.cpp #1160

ggerganov merged 56 commits into master from sync-llama.cpp-25-03-27
ggerganov
danbev ggml : skip intermediate .air file when compiling .metallib (llama/12…
975e013e
ctrysbita ggml-backend : make path_str compatible with C++20 (llama/12269)
6f1482f0
ggerganov tests : fix test-quantize-fns to init the CPU backend (llama/12306)
3b9ab121
opencl: use OpenCL C standard supported by the device (llama/12221)
bd498fb4
yeahdongcn musa: support new arch mp_31 and update doc (llama/12296)
f5489240
netrunnereve mat vec double buffer (llama/12188)
0a9761e9
BB-fat metal : Cache the Metal library at the device context level (llama/12…
9091eeae
ggml-backend : fix backend search path (llama/12330)
60bd86a0
IMbackK CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …
51e47802
jeffbolznv vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)
fbedb178
IMbackK CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
505422f2
sycl : variable sg_size support for mmvq kernels (llama/12336)
ab5b0d11
noemotiovon MUL_MAT optimization (llama/12382)
eb84db8f
fairydreaming SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (…
96c5d142
aubreyli SYCL: Delete redundant plus sign and space (llama/12391)
afbf61d5
qnixsynapse SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)
1c8153d1
ckastner cmake : enable building llama.cpp using system libggml (llama/12321)
f6bf093d
jeffbolznv vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)
5c70888c
jeffbolznv vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…
64386ff0
jeffbolznv vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)
a373dd2b
daniandtheweb vulkan: subgroup size tuning (llama/12087)
2662d5da
jeffbolznv vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)
26c8697c
guusw ggml-vulkan: remove unused find_program(glslc) (llama/12416)
915c4f8a
gaugarg-nv cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
5aef5af2
MollySophia llama: Add support for RWKV v7 architecture (llama/12412)
f8d81ea6
lslusarczyk fixed compilation warnings in ggml-sycl (llama/12424)
7018b32b
0cc4m Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…
3eb60975
fj-y-saito ggml : add SVE support for q6_K_q8_K (llama/12361)
6a168b10
lslusarczyk SYCL: using graphs is configurable by environment variable and compil…
245b76f4
yeahdongcn musa: override warp_size of musa device to 32 (llama/12445)
2a94810e
lhez opencl: improve profiling (llama/12442)
0a05a500
jeffbolznv vulkan: Submit once enough matmul work has been recorded (llama/12406)
9f34a512
guusw Fix visionOS build and add CI (llama/12415)
cb26fbfb
jeffbolznv vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)
39748c11
gaugarg-nv CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llam…
2a02e67a
Srihari-mcw ggml : block interleaving support for Q4_K quantization for x86 AVX2 …
bb1d5da2
sgeor255 sycl: cleanup oneDNN related code (llama/12097)
e8d18a49
MakeDecisionWorth Fix build on Windows when ccache enabled (#9954) (llama/9976)
61b74517
netrunnereve vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/1…
d44565f6
stduhpf Vulkan: RTE rounding for cpy to quant (llama/12480)
f7be1eb5
jeffbolznv vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)
60beda8c
yeahdongcn musa: refine compute capability (llama/12493)
d385fc7f
ggerganov ggml : fix quantized cpy op (llama/12310)
80ce83df
jeffbolznv vulkan: fix mul_mat_vec failure in backend tests (llama/12529)
51937836
yeahdongcn CUDA: Fix clang warnings (llama/12540)
3b5918f2
lhez opencl: simplify kernel embedding logic in cmakefile (llama/12503)
935dab3a
qnixsynapse SYCL: disable Q4_0 reorder optimization (llama/12560)
e52a9edf
eddnjjn ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)
25177579
ggerganov ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)
1f4255ce
ggerganov metal : refactor mat-vec code (llama/12569)
6f57c7b0
slojosic-amd HIP: Add support for RDNA4 targets (llama/12372)
44fd23e5
qnixsynapse SYCL: implement memset ggml backend buffer interface (llama/12580)
5bc40006
amritahs-ibm llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)
ff66af54
ggerganov ggml : sync/merge cmake,riscv,powerpc, add common.cmake (#0)
89205af3
ggerganov sync : llama.cpp
4c4f07ab
ggerganov files : remove old wkv6 sources (#0)
c838c22e
ggerganov ggerganov merged 660def06 into master 1 year ago
ggerganov ggerganov deleted the sync-llama.cpp-25-03-27 branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone