llama.cpp
e04dc519 - ggml-cuda : add rope f16, restore performance with parallel decoding (#3272)

Commit

2 years ago

ggml-cuda : add rope f16, restore performance with parallel decoding (#3272) * ggml-cuda : add rope f16, restore performance * offload KQ_mask with all models * fix rope shift --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#3272 - ggml-cuda : add rope f16, restore performance with parallel decoding

Author

slaren

Parents

db0fc2da

llama.cpp e04dc519 - ggml-cuda : add rope f16, restore performance with parallel decoding (#3272)

llama.cpp
e04dc519 - ggml-cuda : add rope f16, restore performance with parallel decoding (#3272)