llama.cpp
llama : improve batched CPU perf with BLAS
#4240
Merged

llama : improve batched CPU perf with BLAS #4240

ggerganov merged 3 commits into master from gg/fix-cpu-blas
ggerganov
ggerganov ggml : use blas even if src0 is not F32
f815fe43
ggerganov llama : use n_threads_batch only when n_tokens >= 32
e9b7a5cb
slaren
ggerganov
slaren
ggerganov
ggerganov llama : revert n_threads_batch logic
87f4102a
FSSRepo
ggerganov ggerganov force pushed to 87f4102a 2 years ago
ggerganov ggerganov merged 8406b092 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone