llama.cpp
CUDA: use async data loading for FlashAttention
#11894

Merged

CUDA: use async data loading for FlashAttention #11894

JohannesGaessler merged 3 commits into ggml-org:master from JohannesGaessler:cuda-fa-mma-17

CUDA: use async data loading for FlashAttention

eb4f7954

github-actions added Nvidia GPU

github-actions added ggml

try CI fix

727db805

LostRuins approved these changes on 2025-02-17

slaren approved these changes on 2025-02-16

Update ggml/src/ggml-cuda/mma.cuh

a9bf57be

JohannesGaessler merged 73e2ed3c into master 1 year ago

Reviewers

slaren

LostRuins

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone