llama.cpp
7032f4f6
- ggml : update softmax n_task calculation (#5126)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
ggml : update softmax n_task calculation (#5126) updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
References
#5126 - ggml: softmax op: update the n_task calculation
Author
snadampal
Parents
5f1925a8
Loading