llama.cpp
CUDA: use async data loading for FlashAttention
#11894
Merged

Loading