llama.cpp
e9e661bd - CUDA: remove unnecessary warp reduce in FA (ggml/1032)

Commit
283 days ago
CUDA: remove unnecessary warp reduce in FA (ggml/1032) * kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>
Author
Committer
Parents
Loading