PR #7188 CUDA: add FP32 FlashAttention vector kernel

CUDA: add FP32 FlashAttention vector kernel #7188

JohannesGaessler merged 4 commits into ggml-org:master from JohannesGaessler:cuda-fa-no-tc-11

JohannesGaessler added performance

JohannesGaessler added Nvidia GPU

JohannesGaessler added Review Complexity : High

JohannesGaessler force pushed from 29e01c3b to de85f908 1 year ago

slaren commented on 2024-05-10

JohannesGaessler force pushed from de85f908 to e0d11842 1 year ago

CUDA: add FP32 FlashAttention vector kernel

bbeb952a

fixup! CUDA: add FP32 FlashAttention vector kernel

41f5f3a4

JohannesGaessler force pushed from e0d11842 to 41f5f3a4 1 year ago

fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

f3c3eafa

fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

aa9cbd76

slaren approved these changes on 2024-05-12

JohannesGaessler merged dc685be4 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

performance Nvidia GPU Review Complexity : High

Milestone

No milestone