llama.cpp
8bece2eb - CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

Commit
96 days ago
CUDA: use mmvq for mul-mat-id for small batch sizes (#18958) * CUDA: use mmvq for mul-mat-id for small batch sizes * add mmvq too * Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs * templatize multi_token_path
Author
Parents
Loading