PR #10665 vulkan: small mul_mat_vec optimizations

dot and delta optimization

netrunnereve committed 288 days ago

server : fix default draft model parameters (#10586)

netrunnereve committed 286 days ago

github : minify link [no ci]

netrunnereve committed 286 days ago

github : minify link [no ci] (revert)

netrunnereve committed 286 days ago

metal : small-batch mat-mul kernels (#10581)

netrunnereve committed 286 days ago

readme : add option, update default value, fix formatting (#10271)

netrunnereve committed 286 days ago

llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)

netrunnereve committed 286 days ago

metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)

netrunnereve committed 286 days ago

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)

netrunnereve committed 286 days ago

CUDA: remove unnecessary warp reduce in FA (ggml/1032)

netrunnereve committed 286 days ago

sync : ggml

netrunnereve committed 286 days ago

scripts : remove amx sync

netrunnereve committed 286 days ago

server : (web ui) Various improvements, now use vite as bundler (#10599)

netrunnereve committed 286 days ago

vulkan: optimize and reenable split_k (#10637)

netrunnereve committed 286 days ago

clip : add sycl support (#10574)

netrunnereve committed 286 days ago

Add docs for creating a static build (#10268) (#10630)

netrunnereve committed 286 days ago

Avoid using __fp16 on ARM with old nvcc (#10616)

netrunnereve committed 286 days ago

fix typo of README.md (#10605)

netrunnereve committed 286 days ago

SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584)

netrunnereve committed 286 days ago

remove a multiply

netrunnereve committed 286 days ago

merge

netrunnereve committed 286 days ago

Merge https://github.com/ggerganov/llama.cpp into vulkan

netrunnereve committed 286 days ago

remove a multiply

netrunnereve committed 286 days ago

additional small optimizations

netrunnereve committed 286 days ago

Merge https://github.com/ggerganov/llama.cpp into vulkan

netrunnereve committed 286 days ago

Merge branch 'ggerganov:master' into vulkan

netrunnereve committed 285 days ago

Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan

netrunnereve committed 284 days ago

Merge branch 'ggerganov:master' into vulkan

netrunnereve committed 284 days ago

Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan

netrunnereve committed 284 days ago

remove ifdefs

netrunnereve committed 283 days ago

cleanup

netrunnereve committed 283 days ago

double the number of rows per workgroup

netrunnereve committed 283 days ago

Update ggml-vulkan.cpp

netrunnereve committed 283 days ago

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

0cc4m committed 283 days ago

only increase the number of rows for amd and subgroup size 64

netrunnereve committed 283 days ago

merge

netrunnereve committed 283 days ago

fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

netrunnereve committed 283 days ago

Merge branch '0cc4m/vulkan-subgroup-size-control' of https://github.com/ggerganov/llama.cpp into vulkan

netrunnereve committed 282 days ago

Merge https://github.com/ggerganov/llama.cpp into vulkan

netrunnereve committed 282 days ago

use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)

netrunnereve committed 282 days ago

manual merge ggml-vulkan.cpp

netrunnereve committed 278 days ago

fix conflict

netrunnereve committed 278 days ago

set min and max subgroup size in any case

netrunnereve committed 278 days ago

Also double the number of rows for Intel GPUs

0cc4m committed 278 days ago

llama.cpp vulkan: small mul_mat_vec optimizations #10665 Merged

llama.cpp
vulkan: small mul_mat_vec optimizations
#10665

Merged