PR #3665 sync : ggml - SemanticDiff

sync : ggml #3665

ggerganov merged 35 commits into master from sync-ggml-26-02-16

CUDA: Fix non-contig rope (llama/19338)

b9c178d4

cuda : extend GGML_OP_PAD to work with non-cont src0 (llama/19429)

0cbf122c

CANN: implement quantized MUL_MAT_ID for MoE models (llama/19228)

fd5cfe14

CANN: Remove unnecessary wrapper for `gml_backend_buft_is_cann` (llam…

30b52c6a

ggml : use noexcept overload for is_regular_file in backend registrat…

2f2b2f5d

ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementati…

0d0bf3a9

Plug memory leaks and free resources on shutdown (llama/19315)

3415a2e4

CUDA : Update CCCL-tag for 3.2 to final release from RC (llama/19486)

99418ad3

metal : consolidate unary ops (llama/19490)

be449af3

ggml : extend bin bcast for permuted src1 (llama/19484)

012bd607

hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (llama/19406)

b26b2b19

metal : extend l2_norm support for non-cont src0 (llama/19502)

81aa77c3

ggml : unary ops support non-cont src0 + metal F16 unary ops (llama/1…

6f46caea

opencl: add general Q6_K mm and Q4_K mv (llama/19347)

bf58e6f3

hexagon: further optimization and tuning of matmul and dot kernels (l…

faa42605

Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/1…

771827e8

metal : update sum_rows kernel to support float4 (llama/19524)

91bbbdf6

opencl: add basic support for q4_1 (llama/19534)

a0bcad8d

hexagon: fix typo in vtcm_needs_release (llama/19545)

8e43b5a5

metal : support GGML_OP_SET (llama/19548)

4ed6faf2

metal : improve concurrency (llama/19555)

31c389d0

CUDA: Do not mutate cgraph for fused ADDs (llama/19566)

76896039

CUDA: loop over ne2*ne3 in case it overflows (llama/19538)

9bdef053

fix vulkan ggml_acc only works in 3d but not 4d (llama/19426)

8f9ca9ce

Fix wrong memcpy length for block_interleave == 4 (llama/19575)

0ae3e1df

vulkan: restore -inf check in FA shaders (llama/19582)

fd765784

hexagon: further optimizations and refactoring for flash attention (l…

cacd47af

vulkan: Add vendor id for Qualcomm drivers (llama/19569)

73d40946

vulkan: support GGML_OP_SET (llama/19584)

b915a235

vulkan: support L2_NORM with contiguous rows (llama/19604)

c2655cfe

metal : fix ACC op (llama/19427)

0d9def9f

ggml : fix GGML_DEBUG with OpenMP (llama/19599)

4a206c84

models : optimize qwen3next graph (llama/19375)

07db21f8

sync : ggml

5a11c727

talk-llama : sync llama.cpp

3413099c

ggerganov merged 364c77f4 into master 19 days ago

ggerganov deleted the sync-ggml-26-02-16 branch 19 days ago

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3665 Merged

sync : ggml #3665

whisper.cpp
sync : ggml
#3665

Merged