llama.cpp
CUDA: quantized KV support for FA vec
#7527
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
CUDA: quantized KV support for FA vec
#7527
JohannesGaessler
merged 14 commits into
ggml-org:master
from
JohannesGaessler:cuda-fattn-vec-quant-3
github-actions
added
Nvidia GPU
github-actions
added
ggml
mofosyne
added
Review Complexity : High
JohannesGaessler
force pushed
from
fab0e7bd
to
bb5fd6d7
1 year ago
github-actions
added
build
JohannesGaessler
marked this pull request as ready for review
1 year ago
JohannesGaessler
force pushed
from
bb5fd6d7
to
9ecde9fe
1 year ago
github-actions
added
testing
JohannesGaessler
force pushed
from
9c370aef
to
a62d7cb8
1 year ago
CUDA: quantized KV support for FA vec
672244a8
try CI fix
462add6a
fix commented-out kernel variants
3194a010
add q8_0 q4_0 tests
f0877604
fix nwarps > batch size
f4003cfb
JohannesGaessler
force pushed
from
4d9ed091
to
f4003cfb
1 year ago
split fattn compile via extern templates
84d9277f
JohannesGaessler
force pushed
from
69515d04
to
84d9277f
1 year ago
github-actions
added
python
fix flake8
61d44b00
fix metal tests
af95ae49
fix cmake
9740ae0a
make generate_cu_files.py executable
2eb0f7f7
add autogenerated .cu files
62056fa6
fix AMD
cc7aef68
error if type_v != FP16 and not flash_attn
d8a0b870
slaren
approved these changes on 2024-05-31
remove obsolete code
05133280
JohannesGaessler
merged
9b596417
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
Assignees
No one assigned
Labels
build
testing
Nvidia GPU
python
Review Complexity : High
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub