llama.cpp
864a0b67 - CUDA: use mma PTX instructions for FlashAttention (#11583)

Commit
217 days ago
CUDA: use mma PTX instructions for FlashAttention (#11583) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>
Parents
Loading