llama.cpp
a3c28439
- cuda : fine-tune >= VOLTA params + use MMQ only for small batches
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
cuda : fine-tune >= VOLTA params + use MMQ only for small batches
References
#3776 - cuda : improve text-generation and batched decoding performance
Author
ggerganov
Parents
16b60dd7
Loading