llama.cpp
CUDA: add FP32 FlashAttention vector kernel
#7188
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
4
Changes
View On
GitHub
CUDA: add FP32 FlashAttention vector kernel
#7188
JohannesGaessler
merged 4 commits into
ggml-org:master
from
JohannesGaessler:cuda-fa-no-tc-11
JohannesGaessler
added
performance
JohannesGaessler
added
Nvidia GPU
JohannesGaessler
added
Review Complexity : High
JohannesGaessler
force pushed
from
29e01c3b
to
de85f908
1 year ago
slaren
commented on 2024-05-10
JohannesGaessler
force pushed
from
de85f908
to
e0d11842
1 year ago
CUDA: add FP32 FlashAttention vector kernel
bbeb952a
fixup! CUDA: add FP32 FlashAttention vector kernel
41f5f3a4
JohannesGaessler
force pushed
from
e0d11842
to
41f5f3a4
1 year ago
fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
f3c3eafa
fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
aa9cbd76
slaren
approved these changes on 2024-05-12
JohannesGaessler
merged
dc685be4
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
Assignees
No one assigned
Labels
performance
Nvidia GPU
Review Complexity : High
Milestone
No milestone
Login to write a write a comment.
Login via GitHub