CUDA: FA support for Deepseek (Ampere or newer) #13306
CUDA: FA support for Deepseek (Ampere or newer)
d19838e9
CISC
commented
on 2025-05-04
wrap __cvta_generic_to_shared for HIP
187054a7
fix loop unrolling for KV data load
dd054465
slaren
approved these changes
on 2025-05-08
do loop unrolling via C++ template
fe2b775a
Assignees
No one assigned
Labels
Nvidia GPU
python
ggml
Login to write a write a comment.
Login via GitHub