cuBLAS: refactor and optimize f16 mat mul performance #1259
slaren
marked this pull request as draft 3 years ago
cuBLAS: refactor, convert fp16 to fp32 on device
cf93fdcf
cuBLAS: use multiple streams, choose smartly between mul_mat_q and mu…
a9ad140c
fix build
4cd0a480
slaren
force pushed
to
4cd0a480
3 years ago
slaren
marked this pull request as ready for review 3 years ago
ggerganov
approved these changes
on 2023-05-01
cuBLAS: update block_q5_1
a79756b2
slaren
merged
58b367c2
into master 3 years ago
slaren
deleted the cuda-mat-mul branch 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub