llama.cpp
CUDA: deduplicate FlashAttention code
#7352
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
1
Changes
View On
GitHub
CUDA: deduplicate FlashAttention code
#7352
JohannesGaessler
merged 1 commit into
ggml-org:master
from
JohannesGaessler:cuda-fattn-refactor-4
slaren
approved these changes on 2024-05-17
mofosyne
added
refactoring
mofosyne
added
Nvidia GPU
mofosyne
added
Review Complexity : High
ggerganov
approved these changes on 2024-05-18
mofosyne
added
merge ready
CUDA: deduplicate FlashAttention code
4d9e90ca
JohannesGaessler
force pushed
from
3ac059bc
to
4d9e90ca
1 year ago
JohannesGaessler
merged
133d99c5
into master
1 year ago
github-actions
added
ggml
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
slaren
Assignees
No one assigned
Labels
refactoring
Nvidia GPU
Review Complexity : High
ggml
merge ready
Milestone
No milestone
Login to write a write a comment.
Login via GitHub