vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron …
b1777fc3
ggml-cuda: remove unneccesary prints on ggml_cuda_init (llama/18502)
e4b32ccd
cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (…
9de6e142
rpc : use unordered_map::reserve and emplace (llama/18513)
e70bc419
metal : adjust extra size for FA buffer to avoid reallocations (llama…
868b0fdf
vulkan: Implement mmvq for iq1_s/iq1_m (llama/18450)
63266b0f
vulkan: Optimize GGML_OP_CUMSUM (llama/18417)
9b5c0cc4
ggml-hexagon: optimize activation function (llama/18393)
1059e04f
(Bugfix, ggml-cuda) Pool alloc count fix + small size computation typ…
4face37a
CUDA: only allocate FA tmp buffer if needed (llama/18564)
77a71748
ggml-cuda: fixes for concurrent streams (llama/18496)
e1f2d207
ggml-cuda: remove unused params in ggml_cuda_graph (llama/18579)
42102e2d
CUDA: disable cuda graph when using n-cpu-moe (llama/18593)
5f0ef6a8
sampling : add support for backend sampling (llama/17004)
6534b0e8
CANN: add operator fusion support for ADD + RMS_NORM (llama/17512)
36d41d86
vulkan: handle quantize_q8_1 overflowing the max workgroup count (lla…
8fdefeb7
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (llama/18582)
f58ebe6c
ggml-cuda: check for srcs outside the cgraph (llama/18583)
f1cc70fa
CUDA: fix FA FP16 accumulator overflow for Granite (llama/18614)
30f7deb4
ggml webgpu: add CEIL operation support (llama/18605)
928e20c8
CANN: Make `valid_values` variable `static const` (llama/18627)
05ebbf8f
ggml : fix avx512bf16 build (llama/18623)
8470e13e
mmq.cu: tune mmq/rocblas switching for RDNA (llama/18537)
4d6a7ee1
ggml-cuda: refactor cuda graph usage (llama/18637)
3963f7c5
vulkan: support buffer_from_host_ptr (llama/18467)
954f87a3
ggml : optimize cuda ssm_scan using warp-level reduction (llama/18505)
718693f2
Hexagon add support for f16/f32 flash attention, scale, set-rows and …
b67a0fc7
CANN: Rename `get_env` to `get_env_as_lowercase` (llama/18624)
0ed0e41b
CANN: Fix rename for get_env (llama/18652)
9c5a3dc5
vulkan: more mul mat optimizations (llama/18533)
92948849
vulkan: Warptile tuning for Intel Xe2/Xe3 (llama/18178)
517629e9
vulkan: reject ops when a tensor is too large to allocate (llama/18646)
f8f39015
cuda : fix build on cuda 12.8 (llama/18672)
b0a98658
opencl: add FILL op support (llama/18682)
f6c1a9dc
ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (llama/18535)
cc6d375a
metal : add MoE kernel specialization for ne20=5 (llama/18667)
d5fe8dda
vulkan: optimize ssm_scan (llama/18630)
cea84fa5
vulkan: fix push constant size for quantize_q8_1 (llama/18687)
ca716811
ggml webgpu: initial flashattention implementation (llama/18610)
d4c7a35e
ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (llama/18628)
a0670c78
llama: use host memory if device reports 0 memory (llama/18587)
c48c9664
Updates to webgpu get_memory (llama/18707)
55fd887c
opencl: add EXPM1 op (llama/18704)
f4ac80a8
Corrected: changed s13 = src1->nb[3] instead of nb[2] (llama/18724)
d2aceabb
cmake : update blas logic (llama/18205)
a17a360c
HIP: adjust RDNA3.5 MMQ kernel selction logic (llama/18666)
b70ed226
opencl: add SOFTPLUS op support (llama/18726)
818127bc
sync : ggml
500facbc
talk-llama : sync llama.cpp
7c4c31cf
danbev
approved these changes
on 2026-01-12
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support …
b56b1f5a
vulkan: Disable large coopmat matmul configuration on proprietary AMD…
a3b19b27
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id)…
7ab0bcb7
vulkan: change memory_logger to be controlled by an env var (llama/18…
a63ba570
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (llama/18800)
784a46f3
sync : ggml
4230998e
ggerganov
merged
47af2fb7
into master 50 days ago
ggerganov
deleted the sync-ggml-26-01-14 branch 50 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub