llama.cpp
dcca0d3a - cpu: introduce chunking for flash attention (#16829)

Commit
6 days ago
cpu: introduce chunking for flash attention (#16829) Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop on top that handles the chunks.
Parents
Loading