llama.cpp
00681dfc
- CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 days ago
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872) * Add fastdiv and fastmodulo to k_bin_bcast kernel * Address review comments * `prod_` instead of `prod` suffix * Add test case for `k_bin_bcast_unravel` in CUDA backend
References
#15872 - CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance
Author
ORippler
Parents
4f658855
Loading