llama.cpp
8bece2eb - CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

Commit

96 days ago

CUDA: use mmvq for mul-mat-id for small batch sizes (#18958) * CUDA: use mmvq for mul-mat-id for small batch sizes * add mmvq too * Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs * templatize multi_token_path

References

#18958 - CUDA: use mmvq for mul-mat-id for small batch sizes

Author

am17an

Parents

a6fd8ca1

llama.cpp 8bece2eb - CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

llama.cpp
8bece2eb - CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)