llama.cpp
Use BLAS to implement ggml_compute_forward_out_prod_f32 for matrix src0, src1 (finetuning speedup ~5x).
#4079
Merged

Use BLAS to implement ggml_compute_forward_out_prod_f32 for matrix src0, src1 (finetuning speedup ~5x). #4079

gwjr
gwjr Remove logically superfluous assertions and order by dimension
d75eae63
gwjr Use cblas_sgemm() to implement ggml_compute_forward_out_prod()
2f0c5dca
gwjr Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors…
e5c1f026
ggerganov
ggerganov
ggerganov approved these changes on 2023-11-16
ggerganov ggerganov added performance
ggerganov ggerganov added training
gwjr Add openBLAS support for sgemm() in compute_forward_out_prod()
da122af0
gwjr
ggerganov ggerganov merged 3e916a07 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone