ggml : remove dirty flag from version string (ggml/1391)
41831494
ggml : add missing AVX512 feature checks (llama/17270)
bb88c254
cmake : fix ARM feature verification (llama/17170)
9e429c47
vulkan: add log RTE support to fix Nvidia CI (llama/17320)
b7dfced3
vulkan: support noncontig i32 copy (llama/17328)
24b981ef
CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (llama/17347)
c137d11b
ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (llama…
27c69271
vulkan: force full subgroups for flash attention to fix intel subgrou…
2097a9c1
Fix too relaxed check on CUDA "fast copy" (can_be_transposed) conditi…
746cbed2
cuda: fix rope fusion for gemma3 (llama/17378)
73d39682
vulkan: Add copy_transpose shader (llama/17371)
ae8865c6
vulkan: support larger argsort (llama/17313)
95d0b0b0
vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FL…
24b14cad
ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (…
1d3a5250
kleidiai: fix zero-size array declaration (llama/17240)
51f54380
ggml : remove useless and error-prone variadic macros (llama/17399)
2f20938b
DGX Spark: UMA support (llama/17368)
510805e6
ggml : Fix transposed SOLVE_TRI result (llama/17323)
46f893c2
ggml-hexagon: fix swiglu failure at `test-backend-ops` (llama/17344)
cb3ee1b0
CANN: Refactor `evaluate_and_capture_cann_graph` (llama/17333)
a009dc17
vulkan: disable async for older Intel devices (llama/17369)
cdc1a776
opencl: refine condition for kqv mm (llama/17392)
5c0e4a9c
HIP: RDNA4 tensor core support for MMF (llama/17077)
fc6eae78
vulkan: remove a couple unnecessary switches (llama/17419)
deb4958a
cuda : support non-contiguous i32 to i32 copy (llama/17326)
61e0b7ed
ggml-hexagon: add `hex_supported_buffer` for better buffer supported …
621cb871
ggml-hexagon: Initial Hexagon v68/v69 support (llama/17394)
75cea7f8
CANN: Define `cann_graph_update_required` before macro (llama/17434)
5ed0ddc4
hexagon: add support for ROPE_NEOX (llama/17458)
77d874b1
ggml: add RISC-V cpu-feats (llama/17461)
faf37ffe
ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (ll…
f4ede89d
HIP: WMMA-MMQ kernels for RDNA 4 (llama/17156)
371a2186
vulkan: more FA details in vk_perf_logger (llama/17443)
553d57a4
vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 …
273e4fe7
CANN: supports out_prod operator for F32 and F16 (llama/17406)
e00bb753
ggml : add ggml_top_k (llama/17365)
968db8bc
vulkan: Implement GGML_OP_CUMSUM (llama/17479)
20845004
CANN: Add MROPE and IMROPE support (llama/17401)
f0c54d47
HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (llama/17502)
bb7223da
ggml : fix ARM feature verification (llama/17519)
8e3560c7
ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (llama/17448)
fb31a197
vulkan: Implement top-k (llama/17418)
d8b61e05
vulkan: allow graph_optimize for prompt processing workloads (llama/1…
c8050e5f
Fix chunks being too small with small matrix sizes (llama/17526)
3de43724
opencl: add sqr, sqrt, mean and ssm_conv (llama/17476)
74ef5dd1
vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (llama/17514)
310db24f
vulkan : move contiguous checks to device_supports_op (llama/17490)
ac92424b
ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod …
93f6cdb9
cuda : fix UMA detection on discrete GPUs. (llama/17537)
e682af78
vulkan: Implement SOLVE_TRI (llama/17486)
3727a36c
refactor pad_reflect_1d to make the UT case pass (llama/17204)
93bc8dc5
SOLVE_TRI CUDA kernel for small matrices (llama/17457)
51e842d1
HIP: enable mul_mat_f for RDNA4 (llama/17437)
f92d542d
rpc : cache and reuse compute graphs (llama/15405)
d26d1c8b
vulkan: Implement GGML_OP_TRI (llama/17503)
7a209631
CUDA: no FP16 arithmetic for vector FA kernel (llama/17558)
37e4c2ed
model : Qwen3 Next (llama/16095)
43441ff5
ggml-cuda: add stricter checking for fusion (llama/17568)
90ca4e0a
enable fp16/fast_fp16/bf16_mma on PH1 (llama/17551)
c372bdbb
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…
463003e7
vulkan: improve topk perf for large k, fix overflow in unit tests (ll…
dbf8766f
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (llama/16900)
2fcc0a3a
ggml: replace hwcap with riscv_hwprobe for RVV detection (llama/17567)
28dff065
sycl : support to malloc memory on device more than 4GB, update the d…
a3459484
vulkan : fix FA mask load with bounds check (coopmat2) (llama/17606)
2258930c
cuda : add error checking for cudaMemcpyAsync in argsort (llama/17599)
2e4a7a21
CUDA: add stream-based concurrency (llama/16991)
e68ee6e2
ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (llama/17581)
70664720
model: LFM2-VL fixes (llama/17577)
0defeee6
llama-graph: avoid expand_forward for fusion (llama/17633)
6cc2d053
ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler …
7cd3de89
metal : add FA head size 48 (llama/17619)
32090930
enhance argsort for UT (llama/17573)
26732d28
ggml-cuda: reorder only relevant nodes (llama/17639)
4c89232b
ggml : add fallback definition for HWCAP2_SVE2 (llama/17683)
e2537b4a
ggml : remove redundant n_copies check when setting input/output (lla…
201b9107
CANN: Disable Ger operator of OUT_PROD on 310p device (llama/17563)
a64d46a5
ggml : use svcntb() for SVE vector length detection (llama/17474)
16688c6d
cmake : add utf8 compilation options for msvc (llama/17682)
fffdf679
vulkan: Reduce temporary memory usage for TOP_K (llama/17623)
86cb5ab9
ggml webgpu: add support for emscripten builds (llama/17184)
d263bdbf
metal : fix data race in pipeline library (llama/17731)
4a00f2e3
CUDA: generalized (mma) FA, add Volta support (llama/17505)
7adbcafb
ggml-cpu: remove duplicate conditional check 'iid' (llama/17650)
3794a0d3
build : move _WIN32_WINNT definition to headers (llama/17736)
92e50155
metal : use params per pipeline instance (llama/17739)
194d0164
ggml-cpu : remove asserts always evaluating to false (llama/17728)
f96ebc92
metal: TRI, FILL, EXPM1, SOFTPLUS (llama/16623)
8902c9d9
Add support for CUMSUM and TRI for CUDA. (llama/17584)
8d44d618
HIP: enable WMMA-MMQ INT kernels for RDNA 3 (llama/17576)
e3f3c6ea
CUDA: fix FA VKQ accumulator overflow (llama/17746)
14502d65
Q4/Q8 Tiled Gemm Optimization. (llama/16999)
d30b7440
HIP : fix RDNA4 build (llama/17792)
4170159d
metal : add residency sets keep-alive heartbeat (llama/17766)
322903fa
rpc : fix alloc size logic (llama/17116)
aefcd75f
vulkan: set all memory allocations to high priority (llama/17624)
32ba1ec8
vulkan: enable mmvq for q2_k on NVIDIA (llama/17675)
7e97d3b0
ggml webgpu: unary op suppport, code refactoring, ops support (llama/…
23984be4
vulkan : support conv-2d with large output size (llama/17685)
0b53759b
vulkan: fix top_k bug when there are ties in the input (llama/17659)
0484147a
vulkan : support conv-2d with large output size (llama/17685)
0b53759b
vulkan: add more num_blocks instantiations in rms_norm (llama/17701)
64a3f573
vulkan: Fix mismatch in TOPK_MOE unit test (llama/17541)
191e5f46
vulkan: Replace deprecated VK_EXT_validation_features (llama/17637)
a8d02735
metal : fix build(#17799)
41cf229d
vulkan: support solve_tri with larger N/K values (llama/17781)
875d8614
vulkan: Use one row per workgroup for f32 mmv (llama/17711)
c66c71e9
ggml : improve error handling for search path existence checks (llama…
b67e3abd
HIP: fix RDNA3 FP16/BF16 matrix multiplication (llama/17817)
94be7191
ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU …
c5e18070
ggml-zendnn : add ZenDNN backend for AMD CPUs (llama/17690)
ebff8f9d
vulkan: perf_logger improvements (llama/17672)
898f876f
sycl: add missing BF16 conversion support for Intel oneAPI (llama/17780)
447ef863
Vulkan: improve mul_mat_vec_iq1_m (llama/16907)
d6d44fac
ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (llama/1…
c8d0ee2f
cuda: optimize SOLVE_TRI using registers and FMAF (llama/17703)
e1562e85
cuda : add FILL op support (llama/17851)
821c2071
CUDA: fix FP16 overflow in tile FA kernel (llama/17875)
bef1f5a5
CANN: add support for partial RoPE and Vision mode (llama/17543)
79d86a5c
ggml : allow fill node alloc inplace (llama/17870)
ba463fb5
metal : print node names for debugging (llama/17882)
b6ae0b29
ggml : Provide macos-specific backtrace printing to avoid terminal de…
41bbc034
Add DIAG for CUDA (llama/17873)
2817582b
metal: SSM kernel improvements (llama/17876)
307dc525
fix softmax for iGPU (llama/17838)
c10b4f9a
CUDA: fix unpadded strides in MMA FA kernel (llama/17891)
ea182913
cuda : add missing support check for xielu (llama/17895)
ca8ea18d
ggml : remove GGML_KQ_MASK_PAD constant (llama/17910)
cd9b8c6d
Fix race conditions in threadpool when dealing with dynamic/frequent …
a2886fba
ggml-hexagon: fix `rope` failure at `test-backend-ops` (llama/17565)
0c88de5c
ggml-alloc : fix reuse-parent logic for misaligned sizes (llama/17884)
1da1a686
cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build…
324dd21d
whisper : adjust to ggml changes (#0)
72714d16
sync : ggml
48cdc06e
talk-llama : sync llama.cpp
179d8b1c
ggerganov
force pushed
from
97a2f89e
to
179d8b1c
124 days ago
ggml : arm repack fix build (#0)
f0c9017a
ggerganov
merged
f0c9017a
into master 124 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub