PR #10665 vulkan: small mul_mat_vec optimizations

vulkan: small mul_mat_vec optimizations #10665

0cc4m merged 44 commits into ggml-org:master from vulkan

dot and delta optimization

b7ad2345

server : fix default draft model parameters (#10586)

be2d0048

github : minify link [no ci]

ed8649f8

github : minify link [no ci] (revert)

ca7c2135

metal : small-batch mat-mul kernels (#10581)

d6753d70

readme : add option, update default value, fix formatting (#10271)

d37b7e09

llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)

f697baf8

metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)

e92a46be

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)

2b155906

CUDA: remove unnecessary warp reduce in FA (ggml/1032)

0df0452a

sync : ggml

69c7f204

scripts : remove amx sync

f8fe71ab

server : (web ui) Various improvements, now use vite as bundler (#10599)

70f0346f

vulkan: optimize and reenable split_k (#10637)

fa9abd6c

clip : add sycl support (#10574)

0fa9dc4c

Add docs for creating a static build (#10268) (#10630)

0a81a82f

Avoid using __fp16 on ARM with old nvcc (#10616)

9075271c

fix typo of README.md (#10605)

4153d57c

SYCL : Move to compile time oneMKL interface backend selection for NV…

e147054b

remove a multiply

062f256e

netrunnereve requested a review from

0cc4m 281 days ago

github-actions added Vulkan

github-actions added ggml

merge

fe811349

Merge https://github.com/ggerganov/llama.cpp into vulkan

c403d895

remove a multiply

5fbaf121

additional small optimizations

2f56bac7

Merge https://github.com/ggerganov/llama.cpp into vulkan

591894a0

Merge branch 'ggerganov:master' into vulkan

0b1b7c85

0cc4m commented on 2024-12-06

Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp in…

4eefebc8

Merge branch 'ggerganov:master' into vulkan

32b994e8

Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp in…

4b65c6b9

remove ifdefs

f5a15fc6

cleanup

bd17bc45

double the number of rows per workgroup

4a185ad3

Update ggml-vulkan.cpp

984d4707

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgr…

595c1a7d

only increase the number of rows for amd and subgroup size 64

6de28665

merge

bfecabeb

fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

1c163674

0cc4m commented on 2024-12-08

Merge branch '0cc4m/vulkan-subgroup-size-control' of https://github.c…

8972f1d3

Merge https://github.com/ggerganov/llama.cpp into vulkan

c7bc42ce

use subgroup min and max to check for gcn (requires https://github.co…

9af9e801

netrunnereve marked this pull request as draft 275 days ago

manual merge ggml-vulkan.cpp

d9c6bf16

fix conflict

8b13f2d0

netrunnereve marked this pull request as ready for review 273 days ago

jeffbolznv approved these changes on 2024-12-12

netrunnereve marked this pull request as draft 273 days ago

set min and max subgroup size in any case

1aa26d78

netrunnereve marked this pull request as ready for review 273 days ago

Also double the number of rows for Intel GPUs

20b47d4d

0cc4m merged 64ae0655 into master 273 days ago

netrunnereve deleted the vulkan branch 272 days ago

Reviewers

jeffbolznv

0cc4m

Assignees

No one assigned

Labels

Vulkan ggml

Milestone

No milestone

llama.cpp vulkan: small mul_mat_vec optimizations #10665 Merged

vulkan: small mul_mat_vec optimizations #10665

llama.cpp
vulkan: small mul_mat_vec optimizations
#10665

Merged