sync : ggml #3566

ggerganov merged 135 commits into master from sync-ggml-25-12-12
ggerganov
danbev ggml : remove dirty flag from version string (ggml/1391)
41831494
angt ggml : add missing AVX512 feature checks (llama/17270)
bb88c254
angt cmake : fix ARM feature verification (llama/17170)
9e429c47
0cc4m vulkan: add log RTE support to fix Nvidia CI (llama/17320)
b7dfced3
jeffbolznv vulkan: support noncontig i32 copy (llama/17328)
24b981ef
noemotiovon CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (llama/17347)
c137d11b
JeremyRand ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (llama…
27c69271
0cc4m vulkan: force full subgroups for flash attention to fix intel subgrou…
2097a9c1
pwilkin Fix too relaxed check on CUDA "fast copy" (can_be_transposed) conditi…
746cbed2
am17an cuda: fix rope fusion for gemma3 (llama/17378)
73d39682
jeffbolznv vulkan: Add copy_transpose shader (llama/17371)
ae8865c6
jeffbolznv vulkan: support larger argsort (llama/17313)
95d0b0b0
giuseppe vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FL…
24b14cad
ixgbe ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (…
1d3a5250
sudhiarm kleidiai: fix zero-size array declaration (llama/17240)
51f54380
angt ggml : remove useless and error-prone variadic macros (llama/17399)
2f20938b
sfudally-nvidia DGX Spark: UMA support (llama/17368)
510805e6
pwilkin ggml : Fix transposed SOLVE_TRI result (llama/17323)
46f893c2
chraac ggml-hexagon: fix swiglu failure at `test-backend-ops` (llama/17344)
cb3ee1b0
rauletorresc CANN: Refactor `evaluate_and_capture_cann_graph` (llama/17333)
a009dc17
jeffbolznv vulkan: disable async for older Intel devices (llama/17369)
cdc1a776
lhez opencl: refine condition for kqv mm (llama/17392)
5c0e4a9c
zhang-hui-yulo HIP: RDNA4 tensor core support for MMF (llama/17077)
fc6eae78
jeffbolznv vulkan: remove a couple unnecessary switches (llama/17419)
deb4958a
CISC cuda : support non-contiguous i32 to i32 copy (llama/17326)
61e0b7ed
chraac ggml-hexagon: add `hex_supported_buffer` for better buffer supported …
621cb871
mediouni-m ggml-hexagon: Initial Hexagon v68/v69 support (llama/17394)
75cea7f8
rauletorresc CANN: Define `cann_graph_update_required` before macro (llama/17434)
5ed0ddc4
max-krasnyansky hexagon: add support for ROPE_NEOX (llama/17458)
77d874b1
ixgbe ggml: add RISC-V cpu-feats (llama/17461)
faf37ffe
Alcpz ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (ll…
f4ede89d
jiachengjason HIP: WMMA-MMQ kernels for RDNA 4 (llama/17156)
371a2186
jeffbolznv vulkan: more FA details in vk_perf_logger (llama/17443)
553d57a4
jeffbolznv vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 …
273e4fe7
TianHao324 CANN: supports out_prod operator for F32 and F16 (llama/17406)
e00bb753
ggerganov ggml : add ggml_top_k (llama/17365)
968db8bc
jeffbolznv vulkan: Implement GGML_OP_CUMSUM (llama/17479)
20845004
hipudding CANN: Add MROPE and IMROPE support (llama/17401)
f0c54d47
jiachengjason HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (llama/17502)
bb7223da
angt ggml : fix ARM feature verification (llama/17519)
8e3560c7
xctan ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (llama/17448)
fb31a197
jeffbolznv vulkan: Implement top-k (llama/17418)
d8b61e05
jeffbolznv vulkan: allow graph_optimize for prompt processing workloads (llama/1…
c8050e5f
Alcpz Fix chunks being too small with small matrix sizes (llama/17526)
3de43724
lhez opencl: add sqr, sqrt, mean and ssm_conv (llama/17476)
74ef5dd1
jeffbolznv vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (llama/17514)
310db24f
Acly vulkan : move contiguous checks to device_supports_op (llama/17490)
ac92424b
Alcpz ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod …
93f6cdb9
matt23654 cuda : fix UMA detection on discrete GPUs. (llama/17537)
e682af78
jeffbolznv vulkan: Implement SOLVE_TRI (llama/17486)
3727a36c
NeoZhangJianyu refactor pad_reflect_1d to make the UT case pass (llama/17204)
93bc8dc5
pwilkin SOLVE_TRI CUDA kernel for small matrices (llama/17457)
51e842d1
zhang-hui-yulo HIP: enable mul_mat_f for RDNA4 (llama/17437)
f92d542d
rgerganov rpc : cache and reuse compute graphs (llama/15405)
d26d1c8b
jeffbolznv vulkan: Implement GGML_OP_TRI (llama/17503)
7a209631
JohannesGaessler CUDA: no FP16 arithmetic for vector FA kernel (llama/17558)
37e4c2ed
pwilkin model : Qwen3 Next (llama/16095)
43441ff5
am17an ggml-cuda: add stricter checking for fusion (llama/17568)
90ca4e0a
yeahdongcn enable fp16/fast_fp16/bf16_mma on PH1 (llama/17551)
c372bdbb
slaren ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…
463003e7
jeffbolznv vulkan: improve topk perf for large k, fix overflow in unit tests (ll…
dbf8766f
0cc4m Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (llama/16900)
2fcc0a3a
ixgbe ggml: replace hwcap with riscv_hwprobe for RVV detection (llama/17567)
28dff065
arthw sycl : support to malloc memory on device more than 4GB, update the d…
a3459484
Acly vulkan : fix FA mask load with bounds check (coopmat2) (llama/17606)
2258930c
Mahekk357 cuda : add error checking for cudaMemcpyAsync in argsort (llama/17599)
2e4a7a21
am17an CUDA: add stream-based concurrency (llama/16991)
e68ee6e2
giladgd ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (llama/17581)
70664720
tdakhran model: LFM2-VL fixes (llama/17577)
0defeee6
am17an llama-graph: avoid expand_forward for fusion (llama/17633)
6cc2d053
ggerganov ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler …
7cd3de89
ggerganov metal : add FA head size 48 (llama/17619)
32090930
NeoZhangJianyu enhance argsort for UT (llama/17573)
26732d28
am17an ggml-cuda: reorder only relevant nodes (llama/17639)
4c89232b
angt ggml : add fallback definition for HWCAP2_SVE2 (llama/17683)
e2537b4a
danbev ggml : remove redundant n_copies check when setting input/output (lla…
201b9107
TianHao324 CANN: Disable Ger operator of OUT_PROD on 310p device (llama/17563)
a64d46a5
angt ggml : use svcntb() for SVE vector length detection (llama/17474)
16688c6d
xiaobing318 cmake : add utf8 compilation options for msvc (llama/17682)
fffdf679
jeffbolznv vulkan: Reduce temporary memory usage for TOP_K (llama/17623)
86cb5ab9
reeselevine ggml webgpu: add support for emscripten builds (llama/17184)
d263bdbf
ggerganov metal : fix data race in pipeline library (llama/17731)
4a00f2e3
JohannesGaessler CUDA: generalized (mma) FA, add Volta support (llama/17505)
7adbcafb
GermanAizek ggml-cpu: remove duplicate conditional check 'iid' (llama/17650)
3794a0d3
angt build : move _WIN32_WINNT definition to headers (llama/17736)
92e50155
ggerganov metal : use params per pipeline instance (llama/17739)
194d0164
Alcpz ggml-cpu : remove asserts always evaluating to false (llama/17728)
f96ebc92
gabe-l-hart metal: TRI, FILL, EXPM1, SOFTPLUS (llama/16623)
8902c9d9
pwilkin Add support for CUMSUM and TRI for CUDA. (llama/17584)
8d44d618
jiachengjason HIP: enable WMMA-MMQ INT kernels for RDNA 3 (llama/17576)
e3f3c6ea
JohannesGaessler CUDA: fix FA VKQ accumulator overflow (llama/17746)
14502d65
shalinib-ibm Q4/Q8 Tiled Gemm Optimization. (llama/16999)
d30b7440
JohannesGaessler HIP : fix RDNA4 build (llama/17792)
4170159d
ggerganov metal : add residency sets keep-alive heartbeat (llama/17766)
322903fa
ggerganov rpc : fix alloc size logic (llama/17116)
aefcd75f
jeffbolznv vulkan: set all memory allocations to high priority (llama/17624)
32ba1ec8
jeffbolznv vulkan: enable mmvq for q2_k on NVIDIA (llama/17675)
7e97d3b0
reeselevine ggml webgpu: unary op suppport, code refactoring, ops support (llama/…
23984be4
Acly vulkan : support conv-2d with large output size (llama/17685)
0b53759b
jeffbolznv vulkan: fix top_k bug when there are ties in the input (llama/17659)
0484147a
Acly vulkan : support conv-2d with large output size (llama/17685)
0b53759b
jeffbolznv vulkan: add more num_blocks instantiations in rms_norm (llama/17701)
64a3f573
rillomas vulkan: Fix mismatch in TOPK_MOE unit test (llama/17541)
191e5f46
rillomas vulkan: Replace deprecated VK_EXT_validation_features (llama/17637)
a8d02735
ggerganov metal : fix build(#17799)
41cf229d
jeffbolznv vulkan: support solve_tri with larger N/K values (llama/17781)
875d8614
jeffbolznv vulkan: Use one row per workgroup for f32 mmv (llama/17711)
c66c71e9
flyinskyin2013 ggml : improve error handling for search path existence checks (llama…
b67e3abd
JohannesGaessler HIP: fix RDNA3 FP16/BF16 matrix multiplication (llama/17817)
94be7191
Phylliida ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU …
c5e18070
z-vishal ggml-zendnn : add ZenDNN backend for AMD CPUs (llama/17690)
ebff8f9d
jeffbolznv vulkan: perf_logger improvements (llama/17672)
898f876f
yingying0906 sycl: add missing BF16 conversion support for Intel oneAPI (llama/17780)
447ef863
lovedheart Vulkan: improve mul_mat_vec_iq1_m (llama/16907)
d6d44fac
ixgbe ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (llama/1…
c8d0ee2f
wsbagnsv1 cuda: optimize SOLVE_TRI using registers and FMAF (llama/17703)
e1562e85
JayZenith cuda : add FILL op support (llama/17851)
821c2071
JohannesGaessler CUDA: fix FP16 overflow in tile FA kernel (llama/17875)
bef1f5a5
noemotiovon CANN: add support for partial RoPE and Vision mode (llama/17543)
79d86a5c
CISC ggml : allow fill node alloc inplace (llama/17870)
ba463fb5
ggerganov metal : print node names for debugging (llama/17882)
b6ae0b29
gabe-l-hart ggml : Provide macos-specific backtrace printing to avoid terminal de…
41bbc034
pwilkin Add DIAG for CUDA (llama/17873)
2817582b
gabe-l-hart metal: SSM kernel improvements (llama/17876)
307dc525
NeoZhangJianyu fix softmax for iGPU (llama/17838)
c10b4f9a
JohannesGaessler CUDA: fix unpadded strides in MMA FA kernel (llama/17891)
ea182913
CISC cuda : add missing support check for xielu (llama/17895)
ca8ea18d
ggerganov ggml : remove GGML_KQ_MASK_PAD constant (llama/17910)
cd9b8c6d
max-krasnyansky Fix race conditions in threadpool when dealing with dynamic/frequent …
a2886fba
chraac ggml-hexagon: fix `rope` failure at `test-backend-ops` (llama/17565)
0c88de5c
ggerganov ggml-alloc : fix reuse-parent logic for misaligned sizes (llama/17884)
1da1a686
HerrCai0907 cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build…
324dd21d
ggerganov whisper : adjust to ggml changes (#0)
72714d16
ggerganov sync : ggml
48cdc06e
ggerganov talk-llama : sync llama.cpp
179d8b1c
ggerganov ggerganov force pushed from 97a2f89e to 179d8b1c 124 days ago
ggerganov ggml : arm repack fix build (#0)
f0c9017a
ggerganov ggerganov merged f0c9017a into master 124 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone