llama.cpp
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance
#15872
Merged

CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance #15872

ORippler
ORippler Add fastdiv and fastmodulo to k_bin_bcast kernel
956a1d06
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler commented on 2025-09-08
JohannesGaessler
ORippler Address review comments
b63af608
ORippler ORippler changed the title CUDA: Add `fastdiv` and `fastmodulo` to `k_bin_bcast*`, giving 1-3% E2E performance CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance 7 days ago
ORippler `prod_` instead of `prod` suffix
3cd67088
ORippler Add test case for `k_bin_bcast_unravel` in CUDA backend
4014ae38
ORippler ORippler requested a review from JohannesGaessler JohannesGaessler 6 days ago
github-actions github-actions added testing
JohannesGaessler
JohannesGaessler approved these changes on 2025-09-10
JohannesGaessler JohannesGaessler merged 00681dfc into master 6 days ago
ORippler ORippler deleted the osimons/add_fastdiv_to_k_bin_bcast branch 6 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone