finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079)
* Remove logically superfluous assertions and order by dimension
* Use cblas_sgemm() to implement ggml_compute_forward_out_prod()
* Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace
* Add openBLAS support for sgemm() in compute_forward_out_prod()