CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance #15872
Add fastdiv and fastmodulo to k_bin_bcast kernel
956a1d06
Address review comments
b63af608
ORippler
changed the title CUDA: Add `fastdiv` and `fastmodulo` to `k_bin_bcast*`, giving 1-3% E2E performance CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance 7 days ago
`prod_` instead of `prod` suffix
3cd67088
Add test case for `k_bin_bcast_unravel` in CUDA backend
4014ae38
ORippler
deleted the osimons/add_fastdiv_to_k_bin_bcast branch 6 days ago
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub