llama.cpp
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
#13014
Merged

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID #13014

JohannesGaessler
JohannesGaessler CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
8778dd21
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler commented on 2025-04-18
slaren
JohannesGaessler
slaren
JohannesGaessler fix logic for RoPE support, CUDA graphs
f0fcfd45
JohannesGaessler
slaren
slaren
slaren approved these changes on 2025-04-21
JohannesGaessler add asserts for memory layout and batch size
95dd4a4e
JohannesGaessler JohannesGaessler merged 658987cf into master 143 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone