llama.cpp
Nomic Vulkan backend
#4456
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
155
Changes
View On
GitHub
Nomic Vulkan backend
#4456
cebtenzzre
merged 155 commits into
master
from
ceb/nomic-vulkan
Nomic vulkan backend licensed under the Software for Open Models Lice…
ba15dfd0
Remove warning which fails on windows.
48a45ea4
remove dynamic deps from kompute build
8563fa00
Switch to a dynamic dispatch table instead of linking hard against li…
45c8778b
Completely revamp how we do object management with the vulkan backend…
b7e2e691
Make kompute actually include external SDK headers when requested
beee5726
Throw an exception when allocation fails for vulkan.
68cf1df6
vulkan: disambiguate gpus with the same name
8bea7198
Don't try and install kompute artifacts.
bd5f6399
Sync from device back to host at begin of new prompt.
4ed25b2f
Only use vulkan with known quant that work.
68aca6be
Set the singleton to nullptr here.
addac252
Don't crash on available devices if we can't even create an instance.
2c24d67e
Support for gguf.
1b1416d7
kompute : don't fail build because of -Warray-bounds
6b6c73a9
Upload immediately to device.
9e4f8b4a
Add a common boilerplate code via include and elim copy pasta
77135a3b
Consolidate code for mat x vec kernels and use subgroups more extensi…
93306f16
Move the subgroups and printf into common.
601905e7
Minor cleanup.
5509f743
Refactor getrows to use common code and get ready for q6_k.
4b223ec4
Add q6_k getrows and mul*vec kernel.
f1c9bc18
Fix offset into the qh and now we have working vulkan accelerated for…
06d4b215
Fixes for norm.
32289aa4
Fixup the upstream CMakelists.txt so we can build just llama.cpp with…
6ac39752
Change this back to be in agreement with metal and our previous softm…
de589ced
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother…
bc4b5ed1
kompute : only try to use Vulkan for LLaMA itself
24a4a595
kompute : remove Q6_K from list of supported quant types
3d850db7
f16 mv broadcasting fix (gqa fix)
9db90cbe
q8 mat*vec
ff4212d2
vulkan: implement neox mode for rope
020b1745
falcon h2d + reenable vulkan
8564f790
Delete TODO now that we have q8_0.
09d83f04
add mat*mat ops
f0cd38b9
misc vulkan cleanup
46385ee0
perf: use bigger threadgroups in mm
3327d84a
use op param epsilon for norms
d5741c07
q6k mm works
b78a94bc
rm commented dbg print
4809890d
q4_1 mat*mat
cd0257ed
clean up vulkan/cpu switch
8dc79ac3
attempted speedups
9bc52eba
attempted speedups 2
c1fd6454
use mat*vec shaders for mat*mat
cc05a602
kompute : enable kp_logger and make it static (#8)
21841d31
kompute : make scripts executable
cbc0d1af
Don't try an allocation on a heap that is smaller than the size we re…
84000153
Remove unused push constant that was giving validation errors.
752f7ebd
Lower the workgroup count for some shaders by providing a loop that p…
8d9efbf9
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
74ddf0f1
vulkan : fix missing break in matmul selection (#9)
1c170101
llama : decide to disable Vulkan before loading tensors (#7)
89b71278
Scale the workgroup count down to allow correct generation for falcon…
e006d377
Revert the prompt processing on gpu for now.
a5eb001e
Remove this debug code.
ffd0624b
llama : fix Vulkan whitelist (#11)
f88b1988
kompute : fix issues with debug layers
a8cac532
fix build with external fmtlib (v10)
c438c168
Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD
af00cca0
vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
71565eb0
vulkan : rope n_past is now KQ_pos, f16 rope kernel
84f7fc45
vulkan : optimize workgroup sizes
39abedd1
Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vu…
f194e1b6
vulkan : assert various kernel requirements
a934b2cb
Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vu…
2a41ba72
vulkan : handle ggml_scale for n%8 != 0
6474fc87
Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vu…
fe26e6ad
mention skipped change
9c4dfd06
merge fixup (e16b9fa4baa8a09c6619b116159830e898050942)
02c3309f
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-v…
1829f1d7
vulkan : implement YaRN RoPE scaling (#2268)
208cd52f
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vu…
23f6d51f
vulkan : sync with "migrate to dynamic graphs"
a4bb9c5c
Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo
9ae88baf
relicense Vulkan backend as MIT
56430c32
rename ggml-vulkan -> ggml-kompute
3e09e127
separate shaders from kompute itself
27631dbb
Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vu…
747e1eaf
kompute : fix compile warnings
b906e126
move kompute to a submodule
9af7f58b
remove script with unclear purpose
f7cb0a65
cebtenzzre
requested a review
from
ggerganov
2 years ago
cebtenzzre
added
enhancement
ggerganov
commented on 2023-12-14
ggml : restore 'static' specifiers
c8fd4ba8
ggerganov
commented on 2023-12-14
ggerganov
added
need feedback
refactor llama.cpp modifications
f58f581c
cebtenzzre
commented on 2023-12-15
vulkan : fix free of stack addr in llama_buffer
2d2c76ac
kompute : always destroy Manager via the destructor
80727062
kompute : fix -Wunused-private-field warnings from clang
44b1a97a
Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomi…
8b65f4c5
Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomi…
3959283e
sync xxd commands with GPT4All llama.cpp.cmake
904c563d
Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomi…
ae6d6824
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
3773e1af
cebtenzzre
marked this pull request as draft
2 years ago
ggerganov
commented on 2024-01-17
cebtenzzre
force pushed
to
1637294a
2 years ago
Merge commit 'e7e4df031b9e29d4b55a4e0b0295187f6b213db1' into HEAD
7c527eb5
kompute : initial attempt at ggml-backend v2 support
298d6eec
fix assertion failure
5f660dad
attempt to get test-backend-ops working
070919db
add sanity check and fix kompute teardown order
cad72e12
kompute : ignore exceptions in ggml_vk_available_devices (#12)
76474a7c
kompute : fix rope_f32 and scale ops (#5008)
d6bd4716
clean up old backend code
9431026a
actually fix this assertion
e9d5223d
sync op_rope_f16 with recent op_rope_f32 changes
729e1a4c
never try to evaluate an empty command buffer
07530731
fix supported ops for kompute backend
2f6a279e
kompute : fix ggml_add kernel (#5027)
33e8d6ab
minor cleanup
cb9ceff9
kompute : fix get_rows dispatch -> 4 less failures
0899adf8
kompute : fix op_mul kernel -> 13 less test failures
08e23fd7
kompute : fix more dispatch ambiguity -> 12 less failures
2755ae3d
kompute : remove broken mulrow kernel -> 1 less test failure
0787b80d
fix q4_0/q4_1 mmv, 65 -> 49 failures
1a14099c
fix f16 mmv, 49 -> 41 failures
2b0f642f
cebtenzzre
force pushed
from
1637294a
to
2b0f642f
2 years ago
test-backend-ops : add llama test
2852902e
test-backend-ops : test scale parameter of ggml_soft_max_ext
14509660
kompute : support scale parameter of softmax
308f2796
test-backend-ops : test mask parameter of ggml_soft_max_ext
8bd38fe3
kompute : support mask parameter of softmax
df687b10
test-backend-ops : test llama with different batch sizes
ebb5f7e9
test-backend-ops : increase max_nmse_err so Llama passes
ec68a965
kompute : fix algorithm names
987335ea
kompute : fix q8_0 mmv, 41 -> 28 failures
f5ac6354
test-backend-ops : add Falcon test
1849b854
test-backend-ops : test larger GELU range
6fc99a6e
kompute : fix op_gelu -> Falcon is working on AMDVLK
38d1f0c7
test-backend-ops : restore softmax tests
11b30508
kompute : fix basic f16 get_rows, 28 -> 26 failures
de9fba0d
kompute : fix basic Q6_K get_rows, 26 -> 24 failures
445a3734
kompute : mark last few failing ops as unsupported
3fbf0529
test-backend-ops : make Falcon test faster with a smaller model
39151942
cebtenzzre
marked this pull request as ready for review
2 years ago
kompute : remove unused immintrin.h #include
bc287047
kompute : fix a -Wstrict-aliasing warning
91654ff0
ggerganov
added
high priority
ggerganov
approved these changes on 2024-01-26
ggerganov
requested a review
from
slaren
2 years ago
slaren
commented on 2024-01-26
kompute : remove unnecessary use_mmap=false
61a5cf88
llama : revert unintended whitespace change
e6ce5f21
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
aea84989
test-backend-ops : comment out Llama and Falcon tests
2512799c
test-backend-ops : check all the ops in the test for support in the b…
8ca33dec
kompute : init device automatically and remove an unnecessary free
6af02b19
ggml-kompute.h : remove anything that doesn't need to be public
2ff2d161
kompute : fix #includes
cdab4043
op_mul_mat_mat_f32.comp : fix missing final newline
454baeba
editorconfig-checker : exclude .gitmodules
297fde5f
slaren
commented on 2024-01-26
slaren
commented on 2024-01-26
ci : initial attempt at testing Kompute backend
91324851
main : remove ggml-kompute.h #include
57cecad1
kompute : adapt ggml-kompute API to be compatible with C
4b0c96a9
ci : attempt to fix Vulkan installer path
e6edd44d
ci : do not run tests for Kompute (no GPU)
050d4502
kompute : use llama_backend_init/llama_backend_free to manage device
53046255
slaren
commented on 2024-01-27
kompute : better device management
be7c0559
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
da1dc666
kompute : fix merge issues
dc08e512
kompute : remove llama_load_model_from_file_internal
7e11fe08
cebtenzzre
requested a review
from
slaren
2 years ago
vulkan : correctly fix use-after-free in ggml_vk_current_device
b932cd74
minor fixup
48db724b
fix trailing whitespace
1f98dff7
fix incorrect memcpy
29982114
slaren
approved these changes on 2024-01-29
cebtenzzre
merged
fbf1ddec
into master
2 years ago
cebtenzzre
deleted the ceb/nomic-vulkan branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
ggerganov
Assignees
No one assigned
Labels
enhancement
high priority
need feedback
Milestone
No milestone
Login to write a write a comment.
Login via GitHub