llama.cpp
7fc50c05 - cuBLAS: use host pinned memory and dequantize while copying (#1207)

Commit
2 years ago
cuBLAS: use host pinned memory and dequantize while copying (#1207) * cuBLAS: dequantize simultaneously while copying memory * cuBLAS: use host pinned memory * cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory * cuBLAS: also pin kv cache * fix rebase
Author
Parents
Loading