llama.cpp
864a0b67
- CUDA: use mma PTX instructions for FlashAttention (#11583)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
217 days ago
CUDA: use mma PTX instructions for FlashAttention (#11583) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>
References
#11583 - CUDA: use mma PTX instructions for FlashAttention
Author
JohannesGaessler
Parents
84ec8a58
Loading