llama.cpp
7fc50c05 - cuBLAS: use host pinned memory and dequantize while copying (#1207)

Commit

3 years ago

cuBLAS: use host pinned memory and dequantize while copying (#1207) * cuBLAS: dequantize simultaneously while copying memory * cuBLAS: use host pinned memory * cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory * cuBLAS: also pin kv cache * fix rebase

References

#1207 - cuBLAS: use host pinned memory and dequantize while copying

Author

slaren

Parents

b1ee8f59

llama.cpp 7fc50c05 - cuBLAS: use host pinned memory and dequantize while copying (#1207)

llama.cpp
7fc50c05 - cuBLAS: use host pinned memory and dequantize while copying (#1207)