text-generation-inference
153fcf77 - Fix incorrect cache allocation with multi-query (#2203)

Commit
1 year ago
Fix incorrect cache allocation with multi-query (#2203) We wouldn't allocate any memory in multi-query (1 KV head). Fixes Starcoder et al.
Author
Parents
Loading