vllm
75e94309 - [Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)

Commit
308 days ago
[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>
Parents
Loading