llama.cpp
7e2b9974 - ggml-cuda : update rope implementation for parallel decoding (#3254)

Commit
2 years ago
ggml-cuda : update rope implementation for parallel decoding (#3254) * ggml-cuda : update rope implementation for parallel decoding * better solution for p0 computation * fix rope * simpler rope implementation --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Author
Parents
Loading