llama.cpp
cuBLAS: refactor and optimize f16 mat mul performance
#1259
Merged

cuBLAS: refactor and optimize f16 mat mul performance #1259

slaren merged 4 commits into ggml-org:master from slaren:cuda-mat-mul
slaren
slaren slaren marked this pull request as draft 3 years ago
slaren
slaren cuBLAS: refactor, convert fp16 to fp32 on device
cf93fdcf
slaren cuBLAS: use multiple streams, choose smartly between mul_mat_q and mu…
a9ad140c
slaren fix build
4cd0a480
slaren slaren force pushed to 4cd0a480 3 years ago
slaren slaren marked this pull request as ready for review 3 years ago
ggerganov
ggerganov commented on 2023-05-01
ggerganov
ggerganov approved these changes on 2023-05-01
slaren cuBLAS: update block_q5_1
a79756b2
slaren slaren merged 58b367c2 into master 3 years ago
slaren slaren deleted the cuda-mat-mul branch 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone