llama.cpp
f5ef5cfb
- ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU
References
#3412 - ggml-cuda : perform cublas mat mul of quantized types as f16
Author
slaren
Parents
40e07a60
Loading