llama.cpp
e04dc519 - ggml-cuda : add rope f16, restore performance with parallel decoding (#3272)

Commit
1 year ago
ggml-cuda : add rope f16, restore performance with parallel decoding (#3272) * ggml-cuda : add rope f16, restore performance * offload KQ_mask with all models * fix rope shift --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Author
Parents
Loading