llama.cpp
Use BLAS to implement ggml_compute_forward_out_prod_f32 for matrix src0, src1 (finetuning speedup ~5x).
#4079

Merged

Use BLAS to implement ggml_compute_forward_out_prod_f32 for matrix src0, src1 (finetuning speedup ~5x). #4079

ggerganov merged 4 commits into ggml-org:master from gwjr:out-prod-using-blas

Remove logically superfluous assertions and order by dimension

d75eae63

Use cblas_sgemm() to implement ggml_compute_forward_out_prod()

2f0c5dca

Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors…

e5c1f026

ggerganov approved these changes on 2023-11-16

ggerganov added performance

ggerganov added training

Add openBLAS support for sgemm() in compute_forward_out_prod()

da122af0

ggerganov merged 3e916a07 into master 2 years ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

performance training

Milestone

No milestone