llama.cpp
cuBLAS: use host pinned memory and dequantize while copying
#1207
Merged

cuBLAS: use host pinned memory and dequantize while copying #1207

slaren merged 5 commits into ggml-org:master from slaren:quant-stream
slaren
dfyz
SlyEcho
slaren
SlyEcho
slaren
slaren slaren force pushed 2 years ago
ggerganov
ggerganov approved these changes on 2023-04-28
slaren cuBLAS: dequantize simultaneously while copying memory
d3fd04e9
slaren cuBLAS: use host pinned memory
2dd6deeb
slaren cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory
d5d6a808
slaren cuBLAS: also pin kv cache
3cf2247d
slaren fix rebase
38a021fa
slaren slaren force pushed to 38a021fa 2 years ago
slaren slaren merged 7fc50c05 into master 2 years ago
slaren slaren deleted the quant-stream branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone