PR #3824 sync : ggml - SemanticDiff

sync : ggml #3824

ggerganov merged 55 commits into master from sync-ggml-26-05-25

ggerganov

PMZFX

SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocat…

72121424

0cc4m

vulkan: fix matmul integer pipeline selection (llama/23005)

610058ae

alex-spacemit

ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (llam…

6c502076

ggerganov

logs : reduce (llama/23021)

f4f2d70c

ArberSephirotheca

ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040)

b333d4d8

JohannesGaessler

HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (llama/22880)

9b73285a

pdhinaka

ggml-hexagon: cpy: add contiguous fast-path in reshape copy (llama/23…

8436fb28

am17an

llama + spec: MTP Support (llama/22673)

1e50c6c7

ggerganov

ggml : bump version to 0.12.0 (ggml/1494)

130cd40e

Dev-X25874

ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…

2b85f66f

OriPekelman

ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)

bb01f457

winstonma

vulkan: removed duplicate #include <memory> in headers (llama/23144)

fcd61107

jeffbolznv

vulkan: fuse SSM_CONV + BIAS + SILU (llama/22653)

17d61534

jeffbolznv

vulkan: Support unaligned tensors for ROPE (llama/22637)

621cbd86

ServeurpersoCom

vulkan: add cpy bf16 -> f32 pipelines (llama/22677)

ac163066

jeeb

ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (llama/22009)

7187e1f2

ORippler

CUDA: Continue directly including cuda/iterator (llama/23102)

703eda1e

gabe-l-hart

feat: Support d_conv=15 for ssm-conv.cu (llama/23017)

323bc2d0

aicss-genai

sycl: route small f32 matmuls to oneMKL, bypass oneDNN (llama/22150)

6a5a4993

aicss-genai

sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (llama/22156)

01bf2afc

pdhinaka

ggml-hexagon: add PAD op HVX kernel (llama/23078)

b94493a7

pdhinaka

hexagon: add support for TRI op (llama/22822)

c4631bbc

rgerganov

rpc : keep last_graph_uid in the device context (llama/23273)

3d73095d

aicss-genai

sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (llama/22153)

092d4661

reeselevine

ggml-webgpu : extend GDN for K>1 (llama/23299)

3fbd4a79

aparmp-quic

hexagon: enable support for NORM op (llama/23319)

05bf9c48

aparmp-quic

hexagon: add MROPE and IMROPE support in HTP rope op (llama/23317)

752744d5

shaofeiqi

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (llama/23303)

21612e65

ravel7524

ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (llama/23349)

89f3135a

ggerganov

metal : optimize pad + cpy (llama/23354)

34d3c6b9

aendk

Programmatic Dependent Launch (PDL) for more performance on newer NVI…

a34c0240

max-krasnyansky

hexagon: HMX quantized matmul rework (llama/23368)

64bdb605

daniandtheweb

vulkan: optimize operations in the IM2COL shader (llama/22685)

e9b7cc8c

lhez

opencl: refactor backend initilization (llama/23318)

6ce303bc

tboinovski1

hexagon: ssm-conv fix for large prompts (llama/23307)

3d596aff

TheBlueMatt

ggml : Check the right iface method before using the fallback 2d get …

2b987100

ggerganov

metal : optimize concat kernel and fix set kernel threads (llama/23411)

10254e3b

Constannnnnt

fix(flash-attn): replace f32 with kv_type and q_type (llama/23372)

0e74cab1

ServeurpersoCom

vulkan: fuse snake activation (mul, sin, sqr, mul, add) (llama/22855)

9c206f7a

JohannesGaessler

CUDA: fix PDL CC check for JIT compilation (llama/23471)

ad494e3d

z-sachin

ggml-zendnn : add Q8_0 quantization support (llama/23414)

f86ab6f2

PMZFX

SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (lla…

e3988d4f

karavayev

SYCL : gated_delta_net K>1 (llama/23174)

4e99dde7

sanmai

sycl : Level Zero detection in ggml_sycl_init (llama/23097)

4dbad751

sanmai

SYCL: improve MoE prefill throughput (llama/23142)

7084cf00

shawngu-quic

opencl: generalize Adreno MoE kernels on M (llama/23449)

59477535

jeffbolznv

vulkan: fix windows find_package of SPIRV-Headers (llama/23215)

2282c7f5

dskwe

ggml : Check the right iface method before using the fallback 2d get …

945fb5f2

njsyw1997

hexagon: apply repl optimization in flash attn softmax as #22993 (lla…

89cb85fb

shaofeiqi

opencl: batch profiling to improve speed and prevent memory leaks (ll…

56ee0862

JohannesGaessler

TP: fix entirely zero-sized slices per device (llama/23525)

2bb19cb4

jeffbolznv

ggml : Parallelize quant LUT init (llama/23595)

a16e642a

ggerganov

ggml : bump version to 0.12.1 (ggml/1508)

624bac19

ggerganov

sync : ggml

77ab0a00

ggerganov

talk-llama : sync llama.cpp

9ff9972c

ggerganov

ggerganov merged 865ec171 into master 27 days ago

ggerganov

ggerganov deleted the sync-ggml-26-05-25 branch 27 days ago

Login to write a write a comment.

Login via GitHub

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone