llama.cpp
Nomic Vulkan backend
#4456
Merged

Nomic Vulkan backend #4456

cebtenzzre merged 155 commits into master from ceb/nomic-vulkan
cebtenzzre
niansa Nomic vulkan backend licensed under the Software for Open Models Lice…
ba15dfd0
manyoso Remove warning which fails on windows.
48a45ea4
apage43 remove dynamic deps from kompute build
8563fa00
manyoso Switch to a dynamic dispatch table instead of linking hard against li…
45c8778b
manyoso Completely revamp how we do object management with the vulkan backend…
b7e2e691
apage43 Make kompute actually include external SDK headers when requested
beee5726
manyoso Throw an exception when allocation fails for vulkan.
68cf1df6
apage43 vulkan: disambiguate gpus with the same name
8bea7198
manyoso Don't try and install kompute artifacts.
bd5f6399
manyoso Sync from device back to host at begin of new prompt.
4ed25b2f
manyoso Only use vulkan with known quant that work.
68aca6be
manyoso Set the singleton to nullptr here.
addac252
manyoso Don't crash on available devices if we can't even create an instance.
2c24d67e
manyoso Support for gguf.
1b1416d7
cebtenzzre kompute : don't fail build because of -Warray-bounds
6b6c73a9
manyoso Upload immediately to device.
9e4f8b4a
manyoso Add a common boilerplate code via include and elim copy pasta
77135a3b
manyoso Consolidate code for mat x vec kernels and use subgroups more extensi…
93306f16
manyoso Move the subgroups and printf into common.
601905e7
manyoso Minor cleanup.
5509f743
manyoso Refactor getrows to use common code and get ready for q6_k.
4b223ec4
manyoso Add q6_k getrows and mul*vec kernel.
f1c9bc18
manyoso Fix offset into the qh and now we have working vulkan accelerated for…
06d4b215
manyoso Fixes for norm.
32289aa4
manyoso Fixup the upstream CMakelists.txt so we can build just llama.cpp with…
6ac39752
manyoso Change this back to be in agreement with metal and our previous softm…
de589ced
manyoso Fixes for subgroup size to bring AMD and NVIDIA inline with eachother…
bc4b5ed1
cebtenzzre kompute : only try to use Vulkan for LLaMA itself
24a4a595
cebtenzzre kompute : remove Q6_K from list of supported quant types
3d850db7
apage43 f16 mv broadcasting fix (gqa fix)
9db90cbe
apage43 q8 mat*vec
ff4212d2
apage43 vulkan: implement neox mode for rope
020b1745
apage43 falcon h2d + reenable vulkan
8564f790
manyoso Delete TODO now that we have q8_0.
09d83f04
apage43 add mat*mat ops
f0cd38b9
apage43 misc vulkan cleanup
46385ee0
apage43 perf: use bigger threadgroups in mm
3327d84a
apage43 use op param epsilon for norms
d5741c07
apage43 q6k mm works
b78a94bc
apage43 rm commented dbg print
4809890d
apage43 q4_1 mat*mat
cd0257ed
apage43 clean up vulkan/cpu switch
8dc79ac3
apage43 attempted speedups
9bc52eba
apage43 attempted speedups 2
c1fd6454
apage43 use mat*vec shaders for mat*mat
cc05a602
cebtenzzre kompute : enable kp_logger and make it static (#8)
21841d31
cebtenzzre kompute : make scripts executable
cbc0d1af
manyoso Don't try an allocation on a heap that is smaller than the size we re…
84000153
manyoso Remove unused push constant that was giving validation errors.
752f7ebd
manyoso Lower the workgroup count for some shaders by providing a loop that p…
8d9efbf9
manyoso Fix synchronization problem for AMD Radeon with amdvlk driver or windows
74ddf0f1
cebtenzzre vulkan : fix missing break in matmul selection (#9)
1c170101
cebtenzzre llama : decide to disable Vulkan before loading tensors (#7)
89b71278
manyoso Scale the workgroup count down to allow correct generation for falcon…
e006d377
manyoso Revert the prompt processing on gpu for now.
a5eb001e
manyoso Remove this debug code.
ffd0624b
cebtenzzre llama : fix Vulkan whitelist (#11)
f88b1988
cebtenzzre kompute : fix issues with debug layers
a8cac532
cebtenzzre fix build with external fmtlib (v10)
c438c168
cebtenzzre Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD
af00cca0
cebtenzzre vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
71565eb0
cebtenzzre vulkan : rope n_past is now KQ_pos, f16 rope kernel
84f7fc45
cebtenzzre vulkan : optimize workgroup sizes
39abedd1
cebtenzzre Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vu…
f194e1b6
cebtenzzre vulkan : assert various kernel requirements
a934b2cb
cebtenzzre Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vu…
2a41ba72
cebtenzzre vulkan : handle ggml_scale for n%8 != 0
6474fc87
cebtenzzre Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vu…
fe26e6ad
cebtenzzre mention skipped change
9c4dfd06
cebtenzzre merge fixup (e16b9fa4baa8a09c6619b116159830e898050942)
02c3309f
cebtenzzre Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-v…
1829f1d7
cebtenzzre vulkan : implement YaRN RoPE scaling (#2268)
208cd52f
cebtenzzre Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vu…
23f6d51f
cebtenzzre vulkan : sync with "migrate to dynamic graphs"
a4bb9c5c
cebtenzzre Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo
9ae88baf
cebtenzzre relicense Vulkan backend as MIT
56430c32
cebtenzzre rename ggml-vulkan -> ggml-kompute
3e09e127
cebtenzzre separate shaders from kompute itself
27631dbb
cebtenzzre Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vu…
747e1eaf
cebtenzzre kompute : fix compile warnings
b906e126
cebtenzzre move kompute to a submodule
9af7f58b
cebtenzzre remove script with unclear purpose
f7cb0a65
cebtenzzre cebtenzzre requested a review from ggerganov ggerganov 2 years ago
cebtenzzre cebtenzzre added enhancement
ggerganov
ggerganov commented on 2023-12-14
shibe2
cebtenzzre
cebtenzzre ggml : restore 'static' specifiers
c8fd4ba8
ggerganov
ggerganov commented on 2023-12-14
slaren
shibe2
ggerganov
ggerganov ggerganov added need feedback
cebtenzzre
netrunnereve
JianbangZ
cebtenzzre refactor llama.cpp modifications
f58f581c
cebtenzzre
cebtenzzre commented on 2023-12-15
cebtenzzre vulkan : fix free of stack addr in llama_buffer
2d2c76ac
cebtenzzre kompute : always destroy Manager via the destructor
80727062
sorasoras
NicanorB
sorasoras
NicanorB
sorasoras
ggerganov
Ph0rk0z
cebtenzzre
sorasoras
0cc4m
sorasoras
lpn256
cebtenzzre kompute : fix -Wunused-private-field warnings from clang
44b1a97a
cebtenzzre Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomi…
8b65f4c5
cebtenzzre Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomi…
3959283e
cebtenzzre
slaren
cebtenzzre sync xxd commands with GPT4All llama.cpp.cmake
904c563d
cebtenzzre Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomi…
ae6d6824
cebtenzzre Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
3773e1af
cebtenzzre cebtenzzre marked this pull request as draft 2 years ago
cebtenzzre
slaren
cebtenzzre
slaren
cebtenzzre
cebtenzzre
slaren
slaren
cebtenzzre
ggerganov
ggerganov
ggerganov commented on 2024-01-17
cebtenzzre
slaren
halcyondaz
axsaucedo
cebtenzzre
cebtenzzre
ggerganov
cebtenzzre
halcyondaz
cebtenzzre cebtenzzre force pushed to 1637294a 2 years ago
cebtenzzre Merge commit 'e7e4df031b9e29d4b55a4e0b0295187f6b213db1' into HEAD
7c527eb5
cebtenzzre kompute : initial attempt at ggml-backend v2 support
298d6eec
cebtenzzre fix assertion failure
5f660dad
cebtenzzre attempt to get test-backend-ops working
070919db
cebtenzzre add sanity check and fix kompute teardown order
cad72e12
cebtenzzre kompute : ignore exceptions in ggml_vk_available_devices (#12)
76474a7c
ggerganov kompute : fix rope_f32 and scale ops (#5008)
d6bd4716
cebtenzzre clean up old backend code
9431026a
cebtenzzre actually fix this assertion
e9d5223d
cebtenzzre sync op_rope_f16 with recent op_rope_f32 changes
729e1a4c
cebtenzzre never try to evaluate an empty command buffer
07530731
cebtenzzre fix supported ops for kompute backend
2f6a279e
ggerganov kompute : fix ggml_add kernel (#5027)
33e8d6ab
cebtenzzre minor cleanup
cb9ceff9
cebtenzzre kompute : fix get_rows dispatch -> 4 less failures
0899adf8
cebtenzzre kompute : fix op_mul kernel -> 13 less test failures
08e23fd7
cebtenzzre kompute : fix more dispatch ambiguity -> 12 less failures
2755ae3d
cebtenzzre kompute : remove broken mulrow kernel -> 1 less test failure
0787b80d
cebtenzzre fix q4_0/q4_1 mmv, 65 -> 49 failures
1a14099c
cebtenzzre fix f16 mmv, 49 -> 41 failures
2b0f642f
cebtenzzre cebtenzzre force pushed from 1637294a to 2b0f642f 2 years ago
cebtenzzre test-backend-ops : add llama test
2852902e
cebtenzzre test-backend-ops : test scale parameter of ggml_soft_max_ext
14509660
cebtenzzre kompute : support scale parameter of softmax
308f2796
cebtenzzre test-backend-ops : test mask parameter of ggml_soft_max_ext
8bd38fe3
cebtenzzre kompute : support mask parameter of softmax
df687b10
cebtenzzre test-backend-ops : test llama with different batch sizes
ebb5f7e9
cebtenzzre test-backend-ops : increase max_nmse_err so Llama passes
ec68a965
cebtenzzre
cebtenzzre kompute : fix algorithm names
987335ea
cebtenzzre kompute : fix q8_0 mmv, 41 -> 28 failures
f5ac6354
cebtenzzre test-backend-ops : add Falcon test
1849b854
cebtenzzre test-backend-ops : test larger GELU range
6fc99a6e
cebtenzzre kompute : fix op_gelu -> Falcon is working on AMDVLK
38d1f0c7
cebtenzzre test-backend-ops : restore softmax tests
11b30508
cebtenzzre
cebtenzzre kompute : fix basic f16 get_rows, 28 -> 26 failures
de9fba0d
ggerganov
cebtenzzre kompute : fix basic Q6_K get_rows, 26 -> 24 failures
445a3734
cebtenzzre kompute : mark last few failing ops as unsupported
3fbf0529
cebtenzzre test-backend-ops : make Falcon test faster with a smaller model
39151942
cebtenzzre
cebtenzzre cebtenzzre marked this pull request as ready for review 2 years ago
cebtenzzre kompute : remove unused immintrin.h #include
bc287047
cebtenzzre kompute : fix a -Wstrict-aliasing warning
91654ff0
ggerganov ggerganov added high priority
daniandtheweb
cebtenzzre
daniandtheweb
daniandtheweb
ggerganov
ggerganov approved these changes on 2024-01-26
ggerganov ggerganov requested a review from slaren slaren 2 years ago
slaren
slaren commented on 2024-01-26
cebtenzzre kompute : remove unnecessary use_mmap=false
61a5cf88
cebtenzzre llama : revert unintended whitespace change
e6ce5f21
cebtenzzre Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
aea84989
cebtenzzre test-backend-ops : comment out Llama and Falcon tests
2512799c
slaren test-backend-ops : check all the ops in the test for support in the b…
8ca33dec
cebtenzzre kompute : init device automatically and remove an unnecessary free
6af02b19
cebtenzzre ggml-kompute.h : remove anything that doesn't need to be public
2ff2d161
cebtenzzre kompute : fix #includes
cdab4043
cebtenzzre op_mul_mat_mat_f32.comp : fix missing final newline
454baeba
cebtenzzre editorconfig-checker : exclude .gitmodules
297fde5f
slaren
slaren commented on 2024-01-26
slaren
slaren commented on 2024-01-26
stduhpf
cebtenzzre ci : initial attempt at testing Kompute backend
91324851
cebtenzzre main : remove ggml-kompute.h #include
57cecad1
cebtenzzre kompute : adapt ggml-kompute API to be compatible with C
4b0c96a9
cebtenzzre ci : attempt to fix Vulkan installer path
e6edd44d
sorasoras
0cc4m
sorasoras
teleprint-me
sorasoras
teleprint-me
cebtenzzre ci : do not run tests for Kompute (no GPU)
050d4502
cebtenzzre kompute : use llama_backend_init/llama_backend_free to manage device
53046255
cebtenzzre
slaren
slaren commented on 2024-01-27
daniandtheweb
sorasoras
stduhpf
cebtenzzre
stduhpf
sorasoras
cebtenzzre
stduhpf
daniandtheweb
ggerganov
sorasoras
slaren
cebtenzzre
0cc4m
cebtenzzre kompute : better device management
be7c0559
cebtenzzre Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
da1dc666
cebtenzzre kompute : fix merge issues
dc08e512
cebtenzzre kompute : remove llama_load_model_from_file_internal
7e11fe08
cebtenzzre cebtenzzre requested a review from slaren slaren 2 years ago
cebtenzzre vulkan : correctly fix use-after-free in ggml_vk_current_device
b932cd74
cebtenzzre minor fixup
48db724b
cebtenzzre fix trailing whitespace
1f98dff7
cebtenzzre fix incorrect memcpy
29982114
slaren
slaren approved these changes on 2024-01-29
cebtenzzre cebtenzzre merged fbf1ddec into master 2 years ago
cebtenzzre cebtenzzre deleted the ceb/nomic-vulkan branch 2 years ago
rajeevn1
cebtenzzre
rajeevn1
Titaniumtown
rajeevn1
cebtenzzre
sorasoras
niansa
akingoverlook

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone