llama.cpp
CUDA: use mma PTX instructions for FlashAttention
#11583
Merged

CUDA: use mma PTX instructions for FlashAttention #11583

JohannesGaessler
JohannesGaessler CUDA: use mma PTX instructions for FlashAttention
60958f60
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
github-actions github-actions added ggml
JohannesGaessler
ggerganov
ggerganov commented on 2025-02-02
sorasoras
JohannesGaessler
JohannesGaessler __shfl_sync workaround for movmatrix
e3b7c574
JohannesGaessler
JohannesGaessler add __shfl_sync to HIP
817f87b2
JohannesGaessler movmatrix CUDA version: 12.0 -> 11.8
45b1b148
JohannesGaessler
slaren
slaren approved these changes on 2025-02-02
JohannesGaessler Update ggml/src/ggml-cuda/mma.cuh
37910e42
JohannesGaessler Update ggml/src/ggml-cuda/mma.cuh
51670bd4
JohannesGaessler JohannesGaessler merged 864a0b67 into master 1 year ago
IMbackK
IMbackK commented on 2025-02-02
JohannesGaessler
IMbackK
ggerganov

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone