vllm
75e94309 - [Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)

Commit
160 days ago
[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>
Parents
  • csrc
    • File
      cache.h
    • File
      cache_kernels.cu
    • File
      torch_bindings.cpp
  • tests/kernels
    • File
      test_cache.py
  • vllm
    • File
      _custom_ops.py
    • attention
      • backends
        • File
          triton_mla.py
      • ops
        • File
          triton_decode_attention.py
    • File
      envs.py
    • File
      utils.py
    • worker
      • File
        cache_engine.py