llama.cpp
d6069051 - cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)

Commit

1 year ago

cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>

References

#3903 - CUDA memory pool with async memory allocation/deallocation

Author

young-developer

Parents

4ff1046d

llama.cpp d6069051 - cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)

llama.cpp
d6069051 - cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)