cuBLAS: use host pinned memory and dequantize while copying #1207
slaren
force pushed
2 years ago
ggerganov
approved these changes
on 2023-04-28
cuBLAS: dequantize simultaneously while copying memory
d3fd04e9
cuBLAS: use host pinned memory
2dd6deeb
cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory
d5d6a808
cuBLAS: also pin kv cache
3cf2247d
fix rebase
38a021fa
slaren
force pushed
to
38a021fa
2 years ago
slaren
merged
7fc50c05
into master 2 years ago
slaren
deleted the quant-stream branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub