PR #3566 sync : ggml - SemanticDiff

ggml : remove dirty flag from version string (ggml/1391)

41831494

ggml : add missing AVX512 feature checks (llama/17270)

bb88c254

cmake : fix ARM feature verification (llama/17170)

9e429c47

vulkan: add log RTE support to fix Nvidia CI (llama/17320)

b7dfced3

vulkan: support noncontig i32 copy (llama/17328)

24b981ef

CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (llama/17347)

c137d11b

ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (llama…

27c69271

vulkan: force full subgroups for flash attention to fix intel subgrou…

2097a9c1

Fix too relaxed check on CUDA "fast copy" (can_be_transposed) conditi…

746cbed2

cuda: fix rope fusion for gemma3 (llama/17378)

73d39682

vulkan: Add copy_transpose shader (llama/17371)

ae8865c6

vulkan: support larger argsort (llama/17313)

95d0b0b0

vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FL…

24b14cad

ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (…

1d3a5250

kleidiai: fix zero-size array declaration (llama/17240)

51f54380

ggml : remove useless and error-prone variadic macros (llama/17399)

2f20938b

DGX Spark: UMA support (llama/17368)

510805e6

ggml : Fix transposed SOLVE_TRI result (llama/17323)

46f893c2

ggml-hexagon: fix swiglu failure at `test-backend-ops` (llama/17344)

cb3ee1b0

CANN: Refactor `evaluate_and_capture_cann_graph` (llama/17333)

a009dc17

vulkan: disable async for older Intel devices (llama/17369)

cdc1a776

opencl: refine condition for kqv mm (llama/17392)

5c0e4a9c

HIP: RDNA4 tensor core support for MMF (llama/17077)

fc6eae78

vulkan: remove a couple unnecessary switches (llama/17419)

deb4958a

cuda : support non-contiguous i32 to i32 copy (llama/17326)

61e0b7ed

ggml-hexagon: add `hex_supported_buffer` for better buffer supported …

621cb871

ggml-hexagon: Initial Hexagon v68/v69 support (llama/17394)

75cea7f8

CANN: Define `cann_graph_update_required` before macro (llama/17434)

5ed0ddc4

hexagon: add support for ROPE_NEOX (llama/17458)

77d874b1

ggml: add RISC-V cpu-feats (llama/17461)

faf37ffe

ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (ll…

f4ede89d

HIP: WMMA-MMQ kernels for RDNA 4 (llama/17156)

371a2186

vulkan: more FA details in vk_perf_logger (llama/17443)

553d57a4

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 …

273e4fe7

CANN: supports out_prod operator for F32 and F16 (llama/17406)

e00bb753

ggml : add ggml_top_k (llama/17365)

968db8bc

vulkan: Implement GGML_OP_CUMSUM (llama/17479)

20845004

CANN: Add MROPE and IMROPE support (llama/17401)

f0c54d47

HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (llama/17502)

bb7223da

ggml : fix ARM feature verification (llama/17519)

8e3560c7

ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (llama/17448)

fb31a197

vulkan: Implement top-k (llama/17418)

d8b61e05

vulkan: allow graph_optimize for prompt processing workloads (llama/1…

c8050e5f

Fix chunks being too small with small matrix sizes (llama/17526)

3de43724

opencl: add sqr, sqrt, mean and ssm_conv (llama/17476)

74ef5dd1

vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (llama/17514)

310db24f

vulkan : move contiguous checks to device_supports_op (llama/17490)

ac92424b

ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod …

93f6cdb9

cuda : fix UMA detection on discrete GPUs. (llama/17537)

e682af78

vulkan: Implement SOLVE_TRI (llama/17486)

3727a36c

refactor pad_reflect_1d to make the UT case pass (llama/17204)

93bc8dc5

SOLVE_TRI CUDA kernel for small matrices (llama/17457)

51e842d1

HIP: enable mul_mat_f for RDNA4 (llama/17437)

f92d542d

rpc : cache and reuse compute graphs (llama/15405)

d26d1c8b

vulkan: Implement GGML_OP_TRI (llama/17503)

7a209631

CUDA: no FP16 arithmetic for vector FA kernel (llama/17558)

37e4c2ed

model : Qwen3 Next (llama/16095)

43441ff5

ggml-cuda: add stricter checking for fusion (llama/17568)

90ca4e0a

enable fp16/fast_fp16/bf16_mma on PH1 (llama/17551)

c372bdbb

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…

463003e7

vulkan: improve topk perf for large k, fix overflow in unit tests (ll…

dbf8766f

Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (llama/16900)

2fcc0a3a

ggml: replace hwcap with riscv_hwprobe for RVV detection (llama/17567)

28dff065

sycl : support to malloc memory on device more than 4GB, update the d…

a3459484

vulkan : fix FA mask load with bounds check (coopmat2) (llama/17606)

2258930c

cuda : add error checking for cudaMemcpyAsync in argsort (llama/17599)

2e4a7a21

CUDA: add stream-based concurrency (llama/16991)

e68ee6e2

ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (llama/17581)

70664720

model: LFM2-VL fixes (llama/17577)

0defeee6

llama-graph: avoid expand_forward for fusion (llama/17633)

6cc2d053

ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler …

7cd3de89

metal : add FA head size 48 (llama/17619)

32090930

enhance argsort for UT (llama/17573)

26732d28

ggml-cuda: reorder only relevant nodes (llama/17639)

4c89232b

ggml : add fallback definition for HWCAP2_SVE2 (llama/17683)

e2537b4a

ggml : remove redundant n_copies check when setting input/output (lla…

201b9107

CANN: Disable Ger operator of OUT_PROD on 310p device (llama/17563)

a64d46a5

ggml : use svcntb() for SVE vector length detection (llama/17474)

16688c6d

cmake : add utf8 compilation options for msvc (llama/17682)

fffdf679

vulkan: Reduce temporary memory usage for TOP_K (llama/17623)

86cb5ab9

ggml webgpu: add support for emscripten builds (llama/17184)

d263bdbf

metal : fix data race in pipeline library (llama/17731)

4a00f2e3

CUDA: generalized (mma) FA, add Volta support (llama/17505)

7adbcafb

ggml-cpu: remove duplicate conditional check 'iid' (llama/17650)

3794a0d3

build : move _WIN32_WINNT definition to headers (llama/17736)

92e50155

metal : use params per pipeline instance (llama/17739)

194d0164

ggml-cpu : remove asserts always evaluating to false (llama/17728)

f96ebc92

metal: TRI, FILL, EXPM1, SOFTPLUS (llama/16623)

8902c9d9

Add support for CUMSUM and TRI for CUDA. (llama/17584)

8d44d618

HIP: enable WMMA-MMQ INT kernels for RDNA 3 (llama/17576)

e3f3c6ea

CUDA: fix FA VKQ accumulator overflow (llama/17746)

14502d65

Q4/Q8 Tiled Gemm Optimization. (llama/16999)

d30b7440

HIP : fix RDNA4 build (llama/17792)

4170159d

metal : add residency sets keep-alive heartbeat (llama/17766)

322903fa

rpc : fix alloc size logic (llama/17116)

aefcd75f

vulkan: set all memory allocations to high priority (llama/17624)

32ba1ec8

vulkan: enable mmvq for q2_k on NVIDIA (llama/17675)

7e97d3b0

ggml webgpu: unary op suppport, code refactoring, ops support (llama/…

23984be4

vulkan : support conv-2d with large output size (llama/17685)

0b53759b

vulkan: fix top_k bug when there are ties in the input (llama/17659)

0484147a

vulkan : support conv-2d with large output size (llama/17685)

0b53759b

vulkan: add more num_blocks instantiations in rms_norm (llama/17701)

64a3f573

vulkan: Fix mismatch in TOPK_MOE unit test (llama/17541)

191e5f46

vulkan: Replace deprecated VK_EXT_validation_features (llama/17637)

a8d02735

metal : fix build(#17799)

41cf229d

vulkan: support solve_tri with larger N/K values (llama/17781)

875d8614

vulkan: Use one row per workgroup for f32 mmv (llama/17711)

c66c71e9

ggml : improve error handling for search path existence checks (llama…

b67e3abd

HIP: fix RDNA3 FP16/BF16 matrix multiplication (llama/17817)

94be7191

ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU …

c5e18070

ggml-zendnn : add ZenDNN backend for AMD CPUs (llama/17690)

ebff8f9d

vulkan: perf_logger improvements (llama/17672)

898f876f

sycl: add missing BF16 conversion support for Intel oneAPI (llama/17780)

447ef863

Vulkan: improve mul_mat_vec_iq1_m (llama/16907)

d6d44fac

ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (llama/1…

c8d0ee2f

cuda: optimize SOLVE_TRI using registers and FMAF (llama/17703)

e1562e85

cuda : add FILL op support (llama/17851)

821c2071

CUDA: fix FP16 overflow in tile FA kernel (llama/17875)

bef1f5a5

CANN: add support for partial RoPE and Vision mode (llama/17543)

79d86a5c

ggml : allow fill node alloc inplace (llama/17870)

ba463fb5

metal : print node names for debugging (llama/17882)

b6ae0b29

ggml : Provide macos-specific backtrace printing to avoid terminal de…

41bbc034

Add DIAG for CUDA (llama/17873)

2817582b

metal: SSM kernel improvements (llama/17876)

307dc525

fix softmax for iGPU (llama/17838)

c10b4f9a

CUDA: fix unpadded strides in MMA FA kernel (llama/17891)

ea182913

cuda : add missing support check for xielu (llama/17895)

ca8ea18d

ggml : remove GGML_KQ_MASK_PAD constant (llama/17910)

cd9b8c6d

Fix race conditions in threadpool when dealing with dynamic/frequent …

a2886fba

ggml-hexagon: fix `rope` failure at `test-backend-ops` (llama/17565)

0c88de5c

ggml-alloc : fix reuse-parent logic for misaligned sizes (llama/17884)

1da1a686

cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build…

324dd21d

whisper : adjust to ggml changes (#0)

72714d16

sync : ggml

48cdc06e

talk-llama : sync llama.cpp

179d8b1c

ggerganov force pushed from 97a2f89e to 179d8b1c 182 days ago

ggml : arm repack fix build (#0)

f0c9017a

ggerganov merged f0c9017a into master 181 days ago

whisper.cpp
sync : ggml
#3566

Merged

sync : ggml #3566

whisper.cpp sync : ggml #3566 Merged

sync : ggml #3566

whisper.cpp
sync : ggml
#3566

Merged