llama.cpp
cuBLAS: use host pinned memory and dequantize while copying
#1207

Merged

cuBLAS: use host pinned memory and dequantize while copying #1207

slaren merged 5 commits into ggml-org:master from slaren:quant-stream

slaren force pushed 2 years ago

ggerganov approved these changes on 2023-04-28

cuBLAS: dequantize simultaneously while copying memory

d3fd04e9

cuBLAS: use host pinned memory

2dd6deeb

cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

d5d6a808

cuBLAS: also pin kv cache

3cf2247d

fix rebase

38a021fa

slaren force pushed to 38a021fa 2 years ago

slaren merged 7fc50c05 into master 2 years ago

slaren deleted the quant-stream branch 2 years ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone