llama.cpp
f5ef5cfb - ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)

Commit
1 year ago
ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU
Author
Parents
Loading