llama.cpp
c5650ed4
- server : avoid context swaps by shifting the KV cache
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
server : avoid context swaps by shifting the KV cache
References
custom-attention-mask
#3228 - llama : custom attention mask + parallel decoding + no context swaps
Author
ggerganov
Parents
ce2d995a
Loading