llama.cpp
CUDA: add FP32 FlashAttention vector kernel
#7188
Merged

CUDA: add FP32 FlashAttention vector kernel #7188

JohannesGaessler
JohannesGaessler
JohannesGaessler JohannesGaessler added performance
JohannesGaessler JohannesGaessler added Nvidia GPU
JohannesGaessler JohannesGaessler added Review Complexity : High
slaren
JohannesGaessler JohannesGaessler force pushed 2 years ago
sorasoras
JohannesGaessler
slaren
slaren commented on 2024-05-10
JohannesGaessler JohannesGaessler force pushed to e0d11842 2 years ago
scottmudge
github-actions
JohannesGaessler CUDA: add FP32 FlashAttention vector kernel
bbeb952a
JohannesGaessler fixup! CUDA: add FP32 FlashAttention vector kernel
41f5f3a4
JohannesGaessler JohannesGaessler force pushed from e0d11842 to 41f5f3a4 2 years ago
JohannesGaessler fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
f3c3eafa
JohannesGaessler fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
aa9cbd76
JohannesGaessler
slaren
slaren approved these changes on 2024-05-12
JohannesGaessler JohannesGaessler merged dc685be4 into master 2 years ago
gilbrotheraway
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone