KV Cache Improved Flexibility (#4668)
This KV-cache adds the foundation for appropriately supporting two key
KV-cache improvements:
1. Delineation between local/dense KV caches/models at the cache level
in addition to the attention module level.
2. Support for multiple types of disjoint KV caches (such as alternating
local + dense attention GPT-Neo).
Follow up item: Determine appropriate statistics for weighting local +
dense KV block ratios when both are present.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>