llama.cpp
7032f4f6 - ggml : update softmax n_task calculation (#5126)

Commit

1 year ago

ggml : update softmax n_task calculation (#5126) updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.

References

#5126 - ggml: softmax op: update the n_task calculation

Author

snadampal

Parents

5f1925a8

llama.cpp 7032f4f6 - ggml : update softmax n_task calculation (#5126)

llama.cpp
7032f4f6 - ggml : update softmax n_task calculation (#5126)