CUDA: use mma PTX instructions for FlashAttention #11583
CUDA: use mma PTX instructions for FlashAttention
60958f60
__shfl_sync workaround for movmatrix
e3b7c574
add __shfl_sync to HIP
817f87b2
movmatrix CUDA version: 12.0 -> 11.8
45b1b148
slaren
approved these changes
on 2025-02-02
Update ggml/src/ggml-cuda/mma.cuh
37910e42
Update ggml/src/ggml-cuda/mma.cuh
51670bd4
Assignees
No one assigned
Labels
Nvidia GPU
python
ggml
Login to write a write a comment.
Login via GitHub