llama.cpp
cuBLAS: refactor and optimize f16 mat mul performance
#1259
Merged

Loading