llama.cpp
CUDA: add FP32 FlashAttention vector kernel
#7188
Merged

CUDA: add FP32 FlashAttention vector kernel #7188

JohannesGaessler
JohannesGaessler
JohannesGaessler JohannesGaessler added performance
JohannesGaessler JohannesGaessler added Nvidia GPU
JohannesGaessler JohannesGaessler added Review Complexity : High
slaren
JohannesGaessler JohannesGaessler force pushed from 29e01c3b to de85f908 1 year ago
sorasoras
JohannesGaessler
slaren
slaren commented on 2024-05-10
JohannesGaessler JohannesGaessler force pushed from de85f908 to e0d11842 1 year ago
scottmudge
github-actions
JohannesGaessler CUDA: add FP32 FlashAttention vector kernel
bbeb952a
JohannesGaessler fixup! CUDA: add FP32 FlashAttention vector kernel
41f5f3a4
JohannesGaessler JohannesGaessler force pushed from e0d11842 to 41f5f3a4 1 year ago
JohannesGaessler fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
f3c3eafa
JohannesGaessler fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
aa9cbd76
JohannesGaessler
slaren
slaren approved these changes on 2024-05-12
JohannesGaessler JohannesGaessler merged dc685be4 into master 1 year ago
gilbrotheraway
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone