llama.cpp
Improve cuBLAS performance by dequantizing on the GPU
#1065
Merged

Improve cuBLAS performance by dequantizing on the GPU #1065

slaren merged 4 commits into ggml-org:master from slaren:cuda-dq
slaren
slaren Improve cuBLAS performance with quantized models by dequantizing on t…
359b0560
slaren Remove unused parameters
891af05e
Green-Sky
Green-Sky commented on 2023-04-19
ggerganov
ggerganov approved these changes on 2023-04-19
slaren
glinscott
slaren Fix possible synchronization issue
95cf9597
glinscott
slaren
Green-Sky
glinscott
slaren
slaren
avada-z
slaren Fix windows build
18337719
ghost
slaren
slaren
SlyEcho
slaren slaren merged 02d69881 into master 2 years ago
slaren slaren deleted the cuda-dq branch 2 years ago
Dampfinchen
jon-chuang
jon-chuang commented on 2023-04-26
jon-chuang

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone