llama.cpp
CUDA: use mma PTX instructions for FlashAttention
#11583

Merged

CUDA: use mma PTX instructions for FlashAttention #11583

JohannesGaessler merged 6 commits into ggml-org:master from JohannesGaessler:cuda-fa-mma-5

CUDA: use mma PTX instructions for FlashAttention

60958f60

github-actions added Nvidia GPU

github-actions added python

github-actions added ggml

ggerganov commented on 2025-02-02

__shfl_sync workaround for movmatrix

e3b7c574

add __shfl_sync to HIP

817f87b2

movmatrix CUDA version: 12.0 -> 11.8

45b1b148

slaren approved these changes on 2025-02-02

Update ggml/src/ggml-cuda/mma.cuh

37910e42

Update ggml/src/ggml-cuda/mma.cuh

51670bd4

JohannesGaessler merged 864a0b67 into master 1 year ago

IMbackK commented on 2025-02-02

Reviewers

slaren

IMbackK

ggerganov

Assignees

No one assigned

Labels

Nvidia GPU python ggml

Milestone

No milestone