KV Cache Improved Flexibility (#4668)

Commit

2 years ago

KV Cache Improved Flexibility (#4668) This KV-cache adds the foundation for appropriately supporting two key KV-cache improvements: 1. Delineation between local/dense KV caches/models at the cache level in addition to the attention module level. 2. Support for multiple types of disjoint KV caches (such as alternating local + dense attention GPT-Neo). Follow up item: Determine appropriate statistics for weighting local + dense KV block ratios when both are present. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

References

#4668 - KV Cache Improved Flexibility

Author

cmikeh2

Parents

54110305

DeepSpeed 901d8070 - KV Cache Improved Flexibility (#4668)

DeepSpeed
901d8070 - KV Cache Improved Flexibility (#4668)