llama.cpp
cuBLAS: refactor and optimize f16 mat mul performance
#1259

Merged

cuBLAS: refactor and optimize f16 mat mul performance #1259

slaren merged 4 commits into ggml-org:master from slaren:cuda-mat-mul

slaren marked this pull request as draft 3 years ago

cuBLAS: refactor, convert fp16 to fp32 on device

cf93fdcf

cuBLAS: use multiple streams, choose smartly between mul_mat_q and mu…

a9ad140c

fix build

4cd0a480

slaren force pushed to 4cd0a480 3 years ago

slaren marked this pull request as ready for review 3 years ago

ggerganov commented on 2023-05-01

ggerganov approved these changes on 2023-05-01

cuBLAS: update block_q5_1

a79756b2

slaren merged 58b367c2 into master 3 years ago

slaren deleted the cuda-mat-mul branch 3 years ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone