llama.cpp
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels
#21168

Merged

ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels #21168

JohannesGaessler merged 11 commits into ggml-org:master from DENEB1312:master

ds_read_b128 for q4_0 and q4_1 mmq kernels

495c3632

iacopPBK requested a review 12 days ago

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler commented on 2026-03-30

Vectorized lds load update: used ggml_cuda_get_max_cpy_bytes and ggml…

cc9ea913

JohannesGaessler commented on 2026-03-30

Explicit for loop in mmq, renamed vec into tmp

62c2f8f7

JohannesGaessler commented on 2026-03-30

Fixed max_cpy usage in the loading loop

0bcddd21

Fixed typo in q4_1 kernel

5d7df5df

JohannesGaessler commented on 2026-04-01

Update ggml/src/ggml-cuda/mmq.cuh

d3065542

Update ggml/src/ggml-cuda/mmq.cuh

777f5943

Update ggml/src/ggml-cuda/mmq.cuh

fbc4cfcd

Renoved trailing white line 500

b9a6e49b

Update mmq.cuh removed other whitelines

ce4c2a23

Remove trailing whitespaces

bc7b30ff

pwilkin approved these changes on 2026-04-07

JohannesGaessler approved these changes on 2026-04-07

JohannesGaessler merged 66c4f9de into master 3 days ago

Reviewers

JohannesGaessler

pwilkin

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone