DeepSpeed
901d8070 - KV Cache Improved Flexibility (#4668)

Commit
2 years ago
KV Cache Improved Flexibility (#4668) This KV-cache adds the foundation for appropriately supporting two key KV-cache improvements: 1. Delineation between local/dense KV caches/models at the cache level in addition to the attention module level. 2. Support for multiple types of disjoint KV caches (such as alternating local + dense attention GPT-Neo). Follow up item: Determine appropriate statistics for weighting local + dense KV block ratios when both are present. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading