CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID #13014
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
8778dd21
fix logic for RoPE support, CUDA graphs
f0fcfd45
slaren
approved these changes
on 2025-04-21
add asserts for memory layout and batch size
95dd4a4e
Assignees
No one assigned
Labels
testing
Nvidia GPU
ggml
Login to write a write a comment.
Login via GitHub