llama.cpp
9777032d - llama : separate compute buffer reserve from fattn check (#15696)

Commit

66 days ago

llama : separate compute buffer reserve from fattn check (#15696) Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

References

#15696 - llama : separate compute buffer reserve from fattn check

Author

slaren

Parents

7d3c9f2b

llama.cpp 9777032d - llama : separate compute buffer reserve from fattn check (#15696)

llama.cpp
9777032d - llama : separate compute buffer reserve from fattn check (#15696)