llama.cpp
780e24a2 - ggml : parallelize FP32 conversion when using BLAS (#5045)

Commit

2 years ago

ggml : parallelize FP32 conversion when using BLAS (#5045) * make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#5045 - ggml: parallelize dequantization or fp format conversion when using blas

Author

ReinForce-II

Parents

3ce7e8f8

llama.cpp 780e24a2 - ggml : parallelize FP32 conversion when using BLAS (#5045)

llama.cpp
780e24a2 - ggml : parallelize FP32 conversion when using BLAS (#5045)