PR #7527 CUDA: quantized KV support for FA vec

CUDA: quantized KV support for FA vec #7527

JohannesGaessler merged 14 commits into ggml-org:master from JohannesGaessler:cuda-fattn-vec-quant-3

github-actions added Nvidia GPU

github-actions added ggml

mofosyne added Review Complexity : High

JohannesGaessler force pushed to bb5fd6d7 2 years ago

github-actions added build

JohannesGaessler marked this pull request as ready for review 2 years ago

JohannesGaessler force pushed from bb5fd6d7 2 years ago

github-actions added testing

JohannesGaessler force pushed 2 years ago

CUDA: quantized KV support for FA vec

672244a8

try CI fix

462add6a

fix commented-out kernel variants

3194a010

add q8_0 q4_0 tests

f0877604

fix nwarps > batch size

f4003cfb

JohannesGaessler force pushed to f4003cfb 2 years ago

split fattn compile via extern templates

84d9277f

JohannesGaessler force pushed to 84d9277f 2 years ago

github-actions added python

fix flake8

61d44b00

fix metal tests

af95ae49

fix cmake

9740ae0a

make generate_cu_files.py executable

2eb0f7f7

add autogenerated .cu files

62056fa6

fix AMD

cc7aef68

error if type_v != FP16 and not flash_attn

d8a0b870

slaren approved these changes on 2024-05-31

remove obsolete code

05133280

JohannesGaessler merged 9b596417 into master 2 years ago

Reviewers

slaren

Assignees

No one assigned

Labels

build testing Nvidia GPU python Review Complexity : High ggml

Milestone

No milestone

llama.cpp CUDA: quantized KV support for FA vec #7527 Merged

CUDA: quantized KV support for FA vec #7527

llama.cpp
CUDA: quantized KV support for FA vec
#7527

Merged