llama.cpp
dcca0d3a
- cpu: introduce chunking for flash attention (#16829)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
6 days ago
cpu: introduce chunking for flash attention (#16829) Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop on top that handles the chunks.
References
#16829 - cpu: introduce chunking for flash attention
Author
max-krasnyansky
Parents
bacddc04
Loading