llama.cpp
7c1bdd0e
- llama : apply K-cache roping for Falcon and Baichuan
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
llama : apply K-cache roping for Falcon and Baichuan
References
#3228 - llama : custom attention mask + parallel decoding + no context swaps
Author
ggerganov
Parents
0cbf3bfe
Loading