llama.cpp
9777032d - llama : separate compute buffer reserve from fattn check (#15696)

Commit
6 days ago
llama : separate compute buffer reserve from fattn check (#15696) Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
Author
Parents
Loading