llama.cpp
CUDA: deduplicate FlashAttention code
#7352
Merged

CUDA: deduplicate FlashAttention code #7352

JohannesGaessler
github-actions
slaren
slaren approved these changes on 2024-05-17
mofosyne mofosyne added refactoring
mofosyne mofosyne added Nvidia GPU
mofosyne mofosyne added Review Complexity : High
ggerganov
ggerganov approved these changes on 2024-05-18
mofosyne mofosyne added merge ready
JohannesGaessler CUDA: deduplicate FlashAttention code
4d9e90ca
JohannesGaessler JohannesGaessler force pushed from 3ac059bc to 4d9e90ca 1 year ago
JohannesGaessler JohannesGaessler merged 133d99c5 into master 1 year ago
github-actions github-actions added ggml

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone