vllm
ec8c1cf7
- squashed commits
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Minimap (CTRL+M)
Commit
195 days ago
squashed commits Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
References
#12588 - [WIP] MLA decode attention - cuda graph support
Author
LucasWilkinson
Committer
LucasWilkinson
Parents
f17f1d46
Files
27
csrc
cache.h
cache_kernels.cu
torch_bindings.cpp
tests/kernels
test_triton_decode_attention.py
vllm
_custom_ops.py
attention
backends
abstract.py
mla
__init__.py
utils.py
triton_mla.py
layer.py
ops
triton_decode_attention.py
selector.py
config.py
engine
arg_utils.py
envs.py
model_executor
model_loader
loader.py
models
deepseek_v2.py
platforms
cpu.py
cuda.py
hpu.py
interface.py
openvino.py
rocm.py
tpu.py
xpu.py
worker
cache_engine.py
model_runner.py
Loading