llama.cpp
9777032d
- llama : separate compute buffer reserve from fattn check (#15696)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
6 days ago
llama : separate compute buffer reserve from fattn check (#15696) Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
References
#15696 - llama : separate compute buffer reserve from fattn check
Author
slaren
Parents
7d3c9f2b
Loading