Offloaded KV Cache #31325
gante
commented
on 2024-06-14
gante
approved these changes
on 2024-06-27
n17s
force pushed
1 year ago
n17s
force pushed
1 year ago
n17s
force pushed
1 year ago
Initial implementation of OffloadedCache
8e57b081
enable usage via cache_implementation
d0e86661
Address feedback, add tests, remove legacy methods.
2e63564f
Remove flash-attn, discover synchronization bugs, fix bugs
cf31e0ea
n17s
force pushed
to
cf31e0ea
1 year ago
Prevent usage in CPU only mode
daf8702e
Add a section about offloaded KV cache to the docs
47950328
Fix typos in docs
1a76762f
Clarifications and better explanation of streams
667811a6
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub