llama.cpp
e04dc519
- ggml-cuda : add rope f16, restore performance with parallel decoding (#3272)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
ggml-cuda : add rope f16, restore performance with parallel decoding (#3272) * ggml-cuda : add rope f16, restore performance * offload KQ_mask with all models * fix rope shift --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
References
#3272 - ggml-cuda : add rope f16, restore performance with parallel decoding
Author
slaren
Parents
db0fc2da
Loading