llama.cpp
27c34c01
- cuda : reduce mallocs in cublasGemmBatchedEx branch
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
cuda : reduce mallocs in cublasGemmBatchedEx branch
References
#3749 - cuda : add batched cuBLAS GEMM for faster attention
Author
ggerganov
Committer
ggerganov
Parents
3d297c1a
Loading