llama.cpp
e9e661bd
- CUDA: remove unnecessary warp reduce in FA (ggml/1032)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
283 days ago
CUDA: remove unnecessary warp reduce in FA (ggml/1032) * kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>
References
#10639 - sync : ggml
Author
mahorozte
Committer
ggerganov
Parents
efb6ae96
Loading