llama.cpp
Improve cuBLAS performance by dequantizing on the GPU
#1065

Merged

Improve cuBLAS performance by dequantizing on the GPU #1065

slaren merged 4 commits into ggml-org:master from slaren:cuda-dq

Improve cuBLAS performance with quantized models by dequantizing on t…

359b0560

Remove unused parameters

891af05e

Green-Sky commented on 2023-04-19

ggerganov approved these changes on 2023-04-19

Fix possible synchronization issue

95cf9597

Fix windows build

18337719

slaren merged 02d69881 into master 2 years ago

slaren deleted the cuda-dq branch 2 years ago

jon-chuang commented on 2023-04-26

Reviewers

ggerganov

Green-Sky

jon-chuang

Assignees

No one assigned

Labels

None yet

Milestone

No milestone