llama.cpp
55a1c5a5
- CUDA: add softmax broadcast (#14475)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
202 days ago
CUDA: add softmax broadcast (#14475) * CUDA: add softmax broadcast * Pass by const ref * Review: Use blockDims for indexing, remove designated initializers * Add TODO for noncontigous input/output
References
#14435 - ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
Author
am17an
Committer
ggerganov
Parents
12a81af4
Loading