transformers
9d1f8450 - Draft version of new KV Caching

Commit
2 years ago
Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly
Author
Committer
Parents
Loading