llama.cpp
58b367c2 - cuBLAS: refactor and optimize f16 mat mul performance (#1259)

Commit

3 years ago

cuBLAS: refactor and optimize f16 mat mul performance (#1259) * cuBLAS: refactor, convert fp16 to fp32 on device * cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16 * fix build * cuBLAS: update block_q5_1

References

#1259 - cuBLAS: refactor and optimize f16 mat mul performance

Author

slaren

Parents

ea3a0ad6

llama.cpp 58b367c2 - cuBLAS: refactor and optimize f16 mat mul performance (#1259)

llama.cpp
58b367c2 - cuBLAS: refactor and optimize f16 mat mul performance (#1259)