llama.cpp
vulkan: small mul_mat_vec optimizations
#10665
Merged

Commits
  • dot and delta optimization
    netrunnereve committed 288 days ago
  • server : fix default draft model parameters (#10586)
    netrunnereve committed 286 days ago
  • github : minify link [no ci]
    netrunnereve committed 286 days ago
  • github : minify link [no ci] (revert)
    netrunnereve committed 286 days ago
  • metal : small-batch mat-mul kernels (#10581)
    netrunnereve committed 286 days ago
  • readme : add option, update default value, fix formatting (#10271)
    netrunnereve committed 286 days ago
  • llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)
    netrunnereve committed 286 days ago
  • metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)
    netrunnereve committed 286 days ago
  • feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)
    netrunnereve committed 286 days ago
  • CUDA: remove unnecessary warp reduce in FA (ggml/1032)
    netrunnereve committed 286 days ago
  • sync : ggml
    netrunnereve committed 286 days ago
  • scripts : remove amx sync
    netrunnereve committed 286 days ago
  • server : (web ui) Various improvements, now use vite as bundler (#10599)
    netrunnereve committed 286 days ago
  • vulkan: optimize and reenable split_k (#10637)
    netrunnereve committed 286 days ago
  • clip : add sycl support (#10574)
    netrunnereve committed 286 days ago
  • Add docs for creating a static build (#10268) (#10630)
    netrunnereve committed 286 days ago
  • Avoid using __fp16 on ARM with old nvcc (#10616)
    netrunnereve committed 286 days ago
  • fix typo of README.md (#10605)
    netrunnereve committed 286 days ago
  • SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584)
    netrunnereve committed 286 days ago
  • remove a multiply
    netrunnereve committed 286 days ago
  • merge
    netrunnereve committed 286 days ago
  • Merge https://github.com/ggerganov/llama.cpp into vulkan
    netrunnereve committed 286 days ago
  • remove a multiply
    netrunnereve committed 286 days ago
  • additional small optimizations
    netrunnereve committed 286 days ago
  • Merge https://github.com/ggerganov/llama.cpp into vulkan
    netrunnereve committed 286 days ago
  • Merge branch 'ggerganov:master' into vulkan
    netrunnereve committed 285 days ago
  • Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan
    netrunnereve committed 284 days ago
  • Merge branch 'ggerganov:master' into vulkan
    netrunnereve committed 284 days ago
  • Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan
    netrunnereve committed 284 days ago
  • remove ifdefs
    netrunnereve committed 283 days ago
  • cleanup
    netrunnereve committed 283 days ago
  • double the number of rows per workgroup
    netrunnereve committed 283 days ago
  • Update ggml-vulkan.cpp
    netrunnereve committed 283 days ago
  • Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats
    0cc4m committed 283 days ago
  • only increase the number of rows for amd and subgroup size 64
    netrunnereve committed 283 days ago
  • merge
    netrunnereve committed 283 days ago
  • fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested
    netrunnereve committed 283 days ago
  • Merge branch '0cc4m/vulkan-subgroup-size-control' of https://github.com/ggerganov/llama.cpp into vulkan
    netrunnereve committed 282 days ago
  • Merge https://github.com/ggerganov/llama.cpp into vulkan
    netrunnereve committed 282 days ago
  • use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)
    netrunnereve committed 282 days ago
  • manual merge ggml-vulkan.cpp
    netrunnereve committed 278 days ago
  • fix conflict
    netrunnereve committed 278 days ago
  • set min and max subgroup size in any case
    netrunnereve committed 278 days ago
  • Also double the number of rows for Intel GPUs
    0cc4m committed 278 days ago
Loading