PR #7352 CUDA: deduplicate FlashAttention code

CUDA: deduplicate FlashAttention code #7352

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-fattn-refactor-4

slaren approved these changes on 2024-05-17

mofosyne added refactoring

mofosyne added Nvidia GPU

mofosyne added Review Complexity : High

ggerganov approved these changes on 2024-05-18

mofosyne added merge ready

CUDA: deduplicate FlashAttention code

4d9e90ca

JohannesGaessler force pushed from 3ac059bc to 4d9e90ca 1 year ago

JohannesGaessler merged 133d99c5 into master 1 year ago

github-actions added ggml

Reviewers

ggerganov

slaren

Assignees

No one assigned

Labels

refactoring Nvidia GPU Review Complexity : High ggml merge ready

Milestone

No milestone