llama.cpp
ggml-cuda : perform cublas mat mul of quantized types as f16
#3412

Merged

ggml-cuda : perform cublas mat mul of quantized types as f16 #3412

slaren merged 3 commits into master from cublas-q-f16

ggml-cuda : perform cublas matrix multiplication of quantized types a…

62832c57

ggerganov approved these changes on 2023-09-30

rename CC_TURING to CC_VOLTA

59937e45

disable fp16 mat mul completely with multi GPU

39ddda27

slaren merged f5ef5cfb into master 2 years ago

slaren deleted the cublas-q-f16 branch 2 years ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone