llama.cpp
00681dfc - CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)

Commit
2 days ago
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872) * Add fastdiv and fastmodulo to k_bin_bcast kernel * Address review comments * `prod_` instead of `prod` suffix * Add test case for `k_bin_bcast_unravel` in CUDA backend
Author
Parents
Loading