llama.cpp
e9b7a5cb - llama : use n_threads_batch only when n_tokens >= 32

Commit

1 year ago

llama : use n_threads_batch only when n_tokens >= 32 ggml-ci

References

#4240 - llama : improve batched CPU perf with BLAS

Author

ggerganov

ggerganov

Committer

ggerganov

ggerganov

Parents

Files1

llama.cpp