PR #4456 Nomic Vulkan backend

Nomic vulkan backend licensed under the Software for Open Models Lice…

ba15dfd0

Remove warning which fails on windows.

48a45ea4

remove dynamic deps from kompute build

8563fa00

Switch to a dynamic dispatch table instead of linking hard against li…

45c8778b

Completely revamp how we do object management with the vulkan backend…

b7e2e691

Make kompute actually include external SDK headers when requested

beee5726

Throw an exception when allocation fails for vulkan.

68cf1df6

vulkan: disambiguate gpus with the same name

8bea7198

Don't try and install kompute artifacts.

bd5f6399

Sync from device back to host at begin of new prompt.

4ed25b2f

Only use vulkan with known quant that work.

68aca6be

Set the singleton to nullptr here.

addac252

Don't crash on available devices if we can't even create an instance.

2c24d67e

Support for gguf.

1b1416d7

kompute : don't fail build because of -Warray-bounds

6b6c73a9

Upload immediately to device.

9e4f8b4a

Add a common boilerplate code via include and elim copy pasta

77135a3b

Consolidate code for mat x vec kernels and use subgroups more extensi…

93306f16

Move the subgroups and printf into common.

601905e7

Minor cleanup.

5509f743

Refactor getrows to use common code and get ready for q6_k.

4b223ec4

Add q6_k getrows and mul*vec kernel.

f1c9bc18

Fix offset into the qh and now we have working vulkan accelerated for…

06d4b215

Fixes for norm.

32289aa4

Fixup the upstream CMakelists.txt so we can build just llama.cpp with…

6ac39752

Change this back to be in agreement with metal and our previous softm…

de589ced

Fixes for subgroup size to bring AMD and NVIDIA inline with eachother…

bc4b5ed1

kompute : only try to use Vulkan for LLaMA itself

24a4a595

kompute : remove Q6_K from list of supported quant types

3d850db7

f16 mv broadcasting fix (gqa fix)

9db90cbe

q8 mat*vec

ff4212d2

vulkan: implement neox mode for rope

020b1745

falcon h2d + reenable vulkan

8564f790

Delete TODO now that we have q8_0.

09d83f04

add mat*mat ops

f0cd38b9

misc vulkan cleanup

46385ee0

perf: use bigger threadgroups in mm

3327d84a

use op param epsilon for norms

d5741c07

q6k mm works

b78a94bc

rm commented dbg print

4809890d

q4_1 mat*mat

cd0257ed

clean up vulkan/cpu switch

8dc79ac3

attempted speedups

9bc52eba

attempted speedups 2

c1fd6454

use mat*vec shaders for mat*mat

cc05a602

kompute : enable kp_logger and make it static (#8)

21841d31

kompute : make scripts executable

cbc0d1af

Don't try an allocation on a heap that is smaller than the size we re…

84000153

Remove unused push constant that was giving validation errors.

752f7ebd

Lower the workgroup count for some shaders by providing a loop that p…

8d9efbf9

Fix synchronization problem for AMD Radeon with amdvlk driver or windows

74ddf0f1

vulkan : fix missing break in matmul selection (#9)

1c170101

llama : decide to disable Vulkan before loading tensors (#7)

89b71278

Scale the workgroup count down to allow correct generation for falcon…

e006d377

Revert the prompt processing on gpu for now.

a5eb001e

Remove this debug code.

ffd0624b

llama : fix Vulkan whitelist (#11)

f88b1988

kompute : fix issues with debug layers

a8cac532

fix build with external fmtlib (v10)

c438c168

Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD

af00cca0

vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)

71565eb0

vulkan : rope n_past is now KQ_pos, f16 rope kernel

84f7fc45

vulkan : optimize workgroup sizes

39abedd1

Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vu…

f194e1b6

vulkan : assert various kernel requirements

a934b2cb

Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vu…

2a41ba72

vulkan : handle ggml_scale for n%8 != 0

6474fc87

Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vu…

fe26e6ad

mention skipped change

9c4dfd06

merge fixup (e16b9fa4baa8a09c6619b116159830e898050942)

02c3309f

Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-v…

1829f1d7

vulkan : implement YaRN RoPE scaling (#2268)

208cd52f

Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vu…

23f6d51f

vulkan : sync with "migrate to dynamic graphs"

a4bb9c5c

Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo

9ae88baf

relicense Vulkan backend as MIT

56430c32

rename ggml-vulkan -> ggml-kompute

3e09e127

separate shaders from kompute itself

27631dbb

Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vu…

747e1eaf

kompute : fix compile warnings

b906e126

move kompute to a submodule

9af7f58b

remove script with unclear purpose

f7cb0a65

cebtenzzre requested a review from

ggerganov 2 years ago

cebtenzzre added enhancement

ggerganov commented on 2023-12-14

ggml : restore 'static' specifiers

c8fd4ba8

ggerganov commented on 2023-12-14

ggerganov added need feedback

refactor llama.cpp modifications

f58f581c

cebtenzzre commented on 2023-12-15

vulkan : fix free of stack addr in llama_buffer

2d2c76ac

kompute : always destroy Manager via the destructor

80727062

kompute : fix -Wunused-private-field warnings from clang

44b1a97a

Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomi…

8b65f4c5

Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomi…

3959283e

sync xxd commands with GPT4All llama.cpp.cmake

904c563d

Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomi…

ae6d6824

Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …

3773e1af

cebtenzzre marked this pull request as draft 2 years ago

ggerganov commented on 2024-01-17

cebtenzzre force pushed to 1637294a 2 years ago

Merge commit 'e7e4df031b9e29d4b55a4e0b0295187f6b213db1' into HEAD

7c527eb5

kompute : initial attempt at ggml-backend v2 support

298d6eec

fix assertion failure

5f660dad

attempt to get test-backend-ops working

070919db

add sanity check and fix kompute teardown order

cad72e12

kompute : ignore exceptions in ggml_vk_available_devices (#12)

76474a7c

kompute : fix rope_f32 and scale ops (#5008)

d6bd4716

clean up old backend code

9431026a

actually fix this assertion

e9d5223d

sync op_rope_f16 with recent op_rope_f32 changes

729e1a4c

never try to evaluate an empty command buffer

07530731

fix supported ops for kompute backend

2f6a279e

kompute : fix ggml_add kernel (#5027)

33e8d6ab

minor cleanup

cb9ceff9

kompute : fix get_rows dispatch -> 4 less failures

0899adf8

kompute : fix op_mul kernel -> 13 less test failures

08e23fd7

kompute : fix more dispatch ambiguity -> 12 less failures

2755ae3d

kompute : remove broken mulrow kernel -> 1 less test failure

0787b80d

fix q4_0/q4_1 mmv, 65 -> 49 failures

1a14099c

fix f16 mmv, 49 -> 41 failures

2b0f642f

cebtenzzre force pushed from 1637294a to 2b0f642f 2 years ago

test-backend-ops : add llama test

2852902e

test-backend-ops : test scale parameter of ggml_soft_max_ext

14509660

kompute : support scale parameter of softmax

308f2796

test-backend-ops : test mask parameter of ggml_soft_max_ext

8bd38fe3

kompute : support mask parameter of softmax

df687b10

test-backend-ops : test llama with different batch sizes

ebb5f7e9

test-backend-ops : increase max_nmse_err so Llama passes

ec68a965

kompute : fix algorithm names

987335ea

kompute : fix q8_0 mmv, 41 -> 28 failures

f5ac6354

test-backend-ops : add Falcon test

1849b854

test-backend-ops : test larger GELU range

6fc99a6e

kompute : fix op_gelu -> Falcon is working on AMDVLK

38d1f0c7

test-backend-ops : restore softmax tests

11b30508

kompute : fix basic f16 get_rows, 28 -> 26 failures

de9fba0d

kompute : fix basic Q6_K get_rows, 26 -> 24 failures

445a3734

kompute : mark last few failing ops as unsupported

3fbf0529

test-backend-ops : make Falcon test faster with a smaller model

39151942

cebtenzzre marked this pull request as ready for review 2 years ago

kompute : remove unused immintrin.h #include

bc287047

kompute : fix a -Wstrict-aliasing warning

91654ff0

ggerganov added high priority

ggerganov approved these changes on 2024-01-26

ggerganov requested a review from

slaren 2 years ago

slaren commented on 2024-01-26

kompute : remove unnecessary use_mmap=false

61a5cf88

llama : revert unintended whitespace change

e6ce5f21

Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …

aea84989

test-backend-ops : comment out Llama and Falcon tests

2512799c

test-backend-ops : check all the ops in the test for support in the b…

8ca33dec

kompute : init device automatically and remove an unnecessary free

6af02b19

ggml-kompute.h : remove anything that doesn't need to be public

2ff2d161

kompute : fix #includes

cdab4043

op_mul_mat_mat_f32.comp : fix missing final newline

454baeba

editorconfig-checker : exclude .gitmodules

297fde5f

slaren commented on 2024-01-26

ci : initial attempt at testing Kompute backend

91324851

main : remove ggml-kompute.h #include

57cecad1

kompute : adapt ggml-kompute API to be compatible with C

4b0c96a9

ci : attempt to fix Vulkan installer path

e6edd44d

ci : do not run tests for Kompute (no GPU)

050d4502

kompute : use llama_backend_init/llama_backend_free to manage device

53046255

slaren commented on 2024-01-27

kompute : better device management

be7c0559

Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …

da1dc666

kompute : fix merge issues

dc08e512

kompute : remove llama_load_model_from_file_internal

7e11fe08

cebtenzzre requested a review from

slaren 2 years ago

vulkan : correctly fix use-after-free in ggml_vk_current_device

b932cd74

minor fixup

48db724b

fix trailing whitespace

1f98dff7

fix incorrect memcpy

29982114

slaren approved these changes on 2024-01-29

cebtenzzre merged fbf1ddec into master 2 years ago

cebtenzzre deleted the ceb/nomic-vulkan branch 2 years ago

llama.cpp
Nomic Vulkan backend
#4456

Merged

Nomic Vulkan backend #4456

llama.cpp Nomic Vulkan backend #4456 Merged

Nomic Vulkan backend #4456

llama.cpp
Nomic Vulkan backend
#4456

Merged