vllm
1d0c9d6b - [Kernel] some optimizations for dense marlin and moe marlin (#16850)

Commit

14 days ago

[Kernel] some optimizations for dense marlin and moe marlin (#16850) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

References

#16850 - [Kernel] some optimizations for dense marlin and moe marlin

Author

jinzhen-lin

jinzhen-lin

Parents

Files26

CMakeLists.txt
csrc
- moe/marlin_moe_wna16
  - .gitignore
  - generate_kernels.py
  - kernel.h
  - marlin_template.h
  - ops.cu
- quantization/gptq_marlin
  - .gitignore
  - dequant.h
  - generate_kernels.py
  - gptq_marlin.cu
  - kernel.h
  - marlin_template.h
- torch_bindings.cpp
tests/kernels
- moe
  - test_moe.py
- quantization
  - test_awq_marlin.py
  - test_marlin_gemm.py
vllm
- _custom_ops.py
- model_executor/layers
  - fused_moe
    - fused_marlin_moe.py
  - quantization
    - awq_marlin.py
    - compressed_tensors/schemes
      - compressed_tensors_w8a16_fp8.py
    - fp8.py
    - gptq_marlin.py
    - kernels/mixed_precision
      - marlin.py
    - utils
      - marlin_utils.py
      - marlin_utils_fp8.py
- scalar_type.py