llama.cpp
8b11deea - Hide latency of bias and gate-loading (#16847)

Commit
52 days ago
Hide latency of bias and gate-loading (#16847) This is realised by loading them into registers before computation of the dot-product, effectively batching them together with said dot-product. As a lot of threads are alive here, the warp scheduler has enough threads available to effectively hide the cost of additionally loading those two floats.
Author
Parents
Loading