llama.cpp
cuda : improve text-generation and batched decoding performance
#3776
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
7
Changes
View On
GitHub
Commits
cuda : prints wip
ggerganov
committed
2 years ago
cuda : new cublas gemm branch for multi-batch quantized src0
ggerganov
committed
2 years ago
cuda : add F32 sgemm branch
ggerganov
committed
2 years ago
cuda : fine-tune >= VOLTA params + use MMQ only for small batches
ggerganov
committed
2 years ago
cuda : remove duplicated cuBLAS GEMM code
ggerganov
committed
2 years ago
cuda : add CUDA_USE_TENSOR_CORES and GGML_CUDA_FORCE_MMQ macros
ggerganov
committed
2 years ago
build : add compile option to force use of MMQ kernels
ggerganov
committed
2 years ago
Loading