llama.cpp
c13fcfbf
- cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Minimap (CTRL+M)
Commit
1 year ago
cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
References
#3749 - cuda : add batched cuBLAS GEMM for faster attention
Author
ggerganov
Parents
84d4ca0e
Files
1
ggml-cuda.cu
Loading