llama.cpp
f5ef5cfb - ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)

Commit

2 years ago

ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU

References

#3412 - ggml-cuda : perform cublas mat mul of quantized types as f16

Author

slaren

Parents

40e07a60

llama.cpp f5ef5cfb - ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)

llama.cpp
f5ef5cfb - ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)