llama.cpp
d6069051 - cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)

Commit
1 year ago
cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
Parents
Loading