PR #8416 CUDA: optimize and refactor MMQ

CUDA: optimize and refactor MMQ #8416

JohannesGaessler merged 2 commits into ggml-org:master from JohannesGaessler:cuda-mmq-256k-5

CUDA: optimize and refactor MMQ

f4b8df49

github-actions added Nvidia GPU

JohannesGaessler added Review Complexity : High

explicit q8_1 memory layouts, add documentation

3c80cddb

slaren approved these changes on 2024-07-11

JohannesGaessler merged 808aba39 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Nvidia GPU Review Complexity : High

Milestone

No milestone