llama.cpp
CUDA: reduce MMQ stream-k overhead
#22298
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
2
Changes
View On
GitHub
CUDA: reduce MMQ stream-k overhead
#22298
JohannesGaessler
merged 2 commits into
ggml-org:master
from
JohannesGaessler:cuda-mmq-fastdiv-8
CUDA: reduce MMQ stream-k overhead
5f1074e0
JohannesGaessler
requested a review
25 days ago
IMbackK
approved these changes on 2026-04-23
github-actions
added
Nvidia GPU
github-actions
added
ggml
ORippler
commented on 2026-04-24
ORippler
commented on 2026-04-24
use 32 bit integers for kbc
07376a7e
am17an
approved these changes on 2026-04-25
JohannesGaessler
merged
9725a313
into master
23 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
am17an
IMbackK
ORippler
Assignees
No one assigned
Labels
Nvidia GPU
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub