llama.cpp
ggml-cuda : perform cublas mat mul of quantized types as f16
#3412
Merged

ggml-cuda : perform cublas mat mul of quantized types as f16 #3412

slaren merged 3 commits into master from cublas-q-f16
slaren
slaren ggml-cuda : perform cublas matrix multiplication of quantized types a…
62832c57
ggerganov
ggerganov approved these changes on 2023-09-30
slaren
slaren rename CC_TURING to CC_VOLTA
59937e45
Ph0rk0z
slaren
Ph0rk0z
slaren disable fp16 mat mul completely with multi GPU
39ddda27
slaren
ggerganov
slaren slaren merged f5ef5cfb into master 2 years ago
slaren slaren deleted the cublas-q-f16 branch 2 years ago
Dampfinchen
slaren
Dampfinchen
YellowRoseCx
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone