llama.cpp
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance
#15872
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
4
Changes
View On
GitHub
Commits
Add fastdiv and fastmodulo to k_bin_bcast kernel
ORippler
committed
115 days ago
Address review comments
ORippler
committed
112 days ago
`prod_` instead of `prod` suffix
ORippler
committed
112 days ago
Add test case for `k_bin_bcast_unravel` in CUDA backend
ORippler
committed
112 days ago
Loading