PR #13014 CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID #13014

JohannesGaessler merged 3 commits into ggml-org:master from JohannesGaessler:cuda-moe-mmv-2

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

8778dd21

github-actions added testing

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler commented on 2025-04-18

fix logic for RoPE support, CUDA graphs

f0fcfd45

slaren approved these changes on 2025-04-21

add asserts for memory layout and batch size

95dd4a4e

JohannesGaessler merged 658987cf into master 143 days ago

Reviewers

slaren

Assignees

No one assigned

Labels

testing Nvidia GPU ggml

Milestone

No milestone