vllm
1d0c9d6b
- [Kernel] some optimizations for dense marlin and moe marlin (#16850)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Previous Change (CTRL+↑)
Next Change (CTRL+↓)
Expand Context Lines
Collapse Context Lines
Hide Minimap (CTRL+M)
Commit
14 days ago
[Kernel] some optimizations for dense marlin and moe marlin (#16850) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
References
#16850 - [Kernel] some optimizations for dense marlin and moe marlin
Author
jinzhen-lin
Parents
f62cad64
Files
26
CMakeLists.txt
csrc
moe/marlin_moe_wna16
.gitignore
generate_kernels.py
kernel.h
marlin_template.h
ops.cu
quantization/gptq_marlin
.gitignore
dequant.h
generate_kernels.py
gptq_marlin.cu
kernel.h
marlin_template.h
torch_bindings.cpp
tests/kernels
moe
test_moe.py
quantization
test_awq_marlin.py
test_marlin_gemm.py
vllm
_custom_ops.py
model_executor/layers
fused_moe
fused_marlin_moe.py
quantization
awq_marlin.py
compressed_tensors/schemes
compressed_tensors_w8a16_fp8.py
fp8.py
gptq_marlin.py
kernels/mixed_precision
marlin.py
utils
marlin_utils.py
marlin_utils_fp8.py
scalar_type.py
Loading