vllm
260d119e - [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137)

Commit
1 year ago
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137)
Author
Parents
  • csrc/quantization/cutlass_w8a8
    • File
      broadcast_load_epilogue_c2x.hpp
    • File
      broadcast_load_epilogue_c3x.hpp
    • File
      scaled_mm_dq_c2x.cu
    • File
      scaled_mm_dq_c3x.cu
  • File
    pyproject.toml
  • tests/kernels
    • File
      test_cutlass.py
  • vllm/model_executor/layers/quantization/compressed_tensors/schemes
    • File
      compressed_tensors_w8a8_statictensor.py