llama.cpp
dcca0d3a - cpu: introduce chunking for flash attention (#16829)

Commit

6 days ago

cpu: introduce chunking for flash attention (#16829) Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop on top that handles the chunks.

References

#16829 - cpu: introduce chunking for flash attention

Author

max-krasnyansky

Parents

bacddc04

llama.cpp dcca0d3a - cpu: introduce chunking for flash attention (#16829)

llama.cpp
dcca0d3a - cpu: introduce chunking for flash attention (#16829)