llama.cpp
12a81af4
- CUDA: broadcasting for FlashAttention mask (#14500)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
125 days ago
CUDA: broadcasting for FlashAttention mask (#14500)
References
#14435 - ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
Author
JohannesGaessler
Committer
ggerganov
Parents
8875523e
Loading