llama.cpp
73e2ed3c
- CUDA: use async data loading for FlashAttention (#11894)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
261 days ago
CUDA: use async data loading for FlashAttention (#11894) * CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>
References
#11894 - CUDA: use async data loading for FlashAttention
Author
JohannesGaessler
Parents
f7b1116a
Loading