llama.cpp
CUDA: reduce MMQ stream-k overhead
#22298
Merged

CUDA: reduce MMQ stream-k overhead #22298

JohannesGaessler
JohannesGaessler CUDA: reduce MMQ stream-k overhead
5f1074e0
JohannesGaessler JohannesGaessler requested a review 25 days ago
JohannesGaessler
JohannesGaessler
JohannesGaessler
nisparks
JohannesGaessler
IMbackK
JohannesGaessler
IMbackK
IMbackK
IMbackK approved these changes on 2026-04-23
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
ORippler
ORippler commented on 2026-04-24
ORippler
ORippler commented on 2026-04-24
JohannesGaessler
nisparks
JohannesGaessler use 32 bit integers for kbc
07376a7e
JohannesGaessler
am17an
am17an approved these changes on 2026-04-25
JohannesGaessler JohannesGaessler merged 9725a313 into master 23 days ago
ORippler
JohannesGaessler
ORippler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone