llama.cpp
8bece2eb
- CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
96 days ago
CUDA: use mmvq for mul-mat-id for small batch sizes (#18958) * CUDA: use mmvq for mul-mat-id for small batch sizes * add mmvq too * Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs * templatize multi_token_path
References
#18958 - CUDA: use mmvq for mul-mat-id for small batch sizes
Author
am17an
Parents
a6fd8ca1
Loading