ggml-cuda : perform cublas mat mul of quantized types as f16 #3412
ggml-cuda : perform cublas matrix multiplication of quantized types a…
62832c57
ggerganov
approved these changes
on 2023-09-30
rename CC_TURING to CC_VOLTA
59937e45
disable fp16 mat mul completely with multi GPU
39ddda27
slaren
merged
f5ef5cfb
into master 2 years ago
slaren
deleted the cublas-q-f16 branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub