sync : ggml #3824

ggerganov merged 55 commits into master from sync-ggml-26-05-25
ggerganov
PMZFX SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocat…
72121424
0cc4m vulkan: fix matmul integer pipeline selection (llama/23005)
610058ae
alex-spacemit ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (llam…
6c502076
ggerganov logs : reduce (llama/23021)
f4f2d70c
ArberSephirotheca ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040)
b333d4d8
JohannesGaessler HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (llama/22880)
9b73285a
pdhinaka ggml-hexagon: cpy: add contiguous fast-path in reshape copy (llama/23…
8436fb28
am17an llama + spec: MTP Support (llama/22673)
1e50c6c7
ggerganov ggml : bump version to 0.12.0 (ggml/1494)
130cd40e
Dev-X25874 ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…
2b85f66f
OriPekelman ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)
bb01f457
winstonma vulkan: removed duplicate #include <memory> in headers (llama/23144)
fcd61107
jeffbolznv vulkan: fuse SSM_CONV + BIAS + SILU (llama/22653)
17d61534
jeffbolznv vulkan: Support unaligned tensors for ROPE (llama/22637)
621cbd86
ServeurpersoCom vulkan: add cpy bf16 -> f32 pipelines (llama/22677)
ac163066
jeeb ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (llama/22009)
7187e1f2
ORippler CUDA: Continue directly including cuda/iterator (llama/23102)
703eda1e
gabe-l-hart feat: Support d_conv=15 for ssm-conv.cu (llama/23017)
323bc2d0
aicss-genai sycl: route small f32 matmuls to oneMKL, bypass oneDNN (llama/22150)
6a5a4993
aicss-genai sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (llama/22156)
01bf2afc
pdhinaka ggml-hexagon: add PAD op HVX kernel (llama/23078)
b94493a7
pdhinaka hexagon: add support for TRI op (llama/22822)
c4631bbc
rgerganov rpc : keep last_graph_uid in the device context (llama/23273)
3d73095d
aicss-genai sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (llama/22153)
092d4661
reeselevine ggml-webgpu : extend GDN for K>1 (llama/23299)
3fbd4a79
aparmp-quic hexagon: enable support for NORM op (llama/23319)
05bf9c48
aparmp-quic hexagon: add MROPE and IMROPE support in HTP rope op (llama/23317)
752744d5
shaofeiqi opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (llama/23303)
21612e65
ravel7524 ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (llama/23349)
89f3135a
ggerganov metal : optimize pad + cpy (llama/23354)
34d3c6b9
aendk Programmatic Dependent Launch (PDL) for more performance on newer NVI…
a34c0240
max-krasnyansky hexagon: HMX quantized matmul rework (llama/23368)
64bdb605
daniandtheweb vulkan: optimize operations in the IM2COL shader (llama/22685)
e9b7cc8c
lhez opencl: refactor backend initilization (llama/23318)
6ce303bc
tboinovski1 hexagon: ssm-conv fix for large prompts (llama/23307)
3d596aff
TheBlueMatt ggml : Check the right iface method before using the fallback 2d get …
2b987100
ggerganov metal : optimize concat kernel and fix set kernel threads (llama/23411)
10254e3b
Constannnnnt fix(flash-attn): replace f32 with kv_type and q_type (llama/23372)
0e74cab1
ServeurpersoCom vulkan: fuse snake activation (mul, sin, sqr, mul, add) (llama/22855)
9c206f7a
JohannesGaessler CUDA: fix PDL CC check for JIT compilation (llama/23471)
ad494e3d
z-sachin ggml-zendnn : add Q8_0 quantization support (llama/23414)
f86ab6f2
PMZFX SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (lla…
e3988d4f
karavayev SYCL : gated_delta_net K>1 (llama/23174)
4e99dde7
sanmai sycl : Level Zero detection in ggml_sycl_init (llama/23097)
4dbad751
sanmai SYCL: improve MoE prefill throughput (llama/23142)
7084cf00
shawngu-quic opencl: generalize Adreno MoE kernels on M (llama/23449)
59477535
jeffbolznv vulkan: fix windows find_package of SPIRV-Headers (llama/23215)
2282c7f5
dskwe ggml : Check the right iface method before using the fallback 2d get …
945fb5f2
njsyw1997 hexagon: apply repl optimization in flash attn softmax as #22993 (lla…
89cb85fb
shaofeiqi opencl: batch profiling to improve speed and prevent memory leaks (ll…
56ee0862
JohannesGaessler TP: fix entirely zero-sized slices per device (llama/23525)
2bb19cb4
jeffbolznv ggml : Parallelize quant LUT init (llama/23595)
a16e642a
ggerganov ggml : bump version to 0.12.1 (ggml/1508)
624bac19
ggerganov sync : ggml
77ab0a00
ggerganov talk-llama : sync llama.cpp
9ff9972c
ggerganov ggerganov merged 865ec171 into master 27 days ago
ggerganov ggerganov deleted the sync-ggml-26-05-25 branch 27 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone