llama.cpp
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance
#15872
Merged

Commits
  • Add fastdiv and fastmodulo to k_bin_bcast kernel
    ORippler committed 115 days ago
  • Address review comments
    ORippler committed 112 days ago
  • `prod_` instead of `prod` suffix
    ORippler committed 112 days ago
  • Add test case for `k_bin_bcast_unravel` in CUDA backend
    ORippler committed 112 days ago
Loading