llama.cpp
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels
#21168
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
11
Changes
View On
GitHub
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels
#21168
JohannesGaessler
merged 11 commits into
ggml-org:master
from
DENEB1312:master
ds_read_b128 for q4_0 and q4_1 mmq kernels
495c3632
iacopPBK
requested a review
12 days ago
github-actions
added
Nvidia GPU
github-actions
added
ggml
JohannesGaessler
commented on 2026-03-30
Vectorized lds load update: used ggml_cuda_get_max_cpy_bytes and ggml…
cc9ea913
JohannesGaessler
commented on 2026-03-30
Explicit for loop in mmq, renamed vec into tmp
62c2f8f7
JohannesGaessler
commented on 2026-03-30
Fixed max_cpy usage in the loading loop
0bcddd21
Fixed typo in q4_1 kernel
5d7df5df
JohannesGaessler
commented on 2026-04-01
Update ggml/src/ggml-cuda/mmq.cuh
d3065542
Update ggml/src/ggml-cuda/mmq.cuh
777f5943
Update ggml/src/ggml-cuda/mmq.cuh
fbc4cfcd
Renoved trailing white line 500
b9a6e49b
Update mmq.cuh removed other whitelines
ce4c2a23
Remove trailing whitespaces
bc7b30ff
pwilkin
approved these changes on 2026-04-07
JohannesGaessler
approved these changes on 2026-04-07
JohannesGaessler
merged
66c4f9de
into master
3 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
JohannesGaessler
pwilkin
Assignees
No one assigned
Labels
Nvidia GPU
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub