llama.cpp
CUDA: use async data loading for FlashAttention
#11894
Merged

CUDA: use async data loading for FlashAttention #11894

JohannesGaessler
JohannesGaessler CUDA: use async data loading for FlashAttention
eb4f7954
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler try CI fix
727db805
LostRuins
LostRuins
LostRuins approved these changes on 2025-02-17
sorasoras
JohannesGaessler
slaren
slaren approved these changes on 2025-02-16
JohannesGaessler Update ggml/src/ggml-cuda/mma.cuh
a9bf57be
JohannesGaessler JohannesGaessler merged 73e2ed3c into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone