vllm
260d119e
- [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Minimap (CTRL+M)
Commit
1 year ago
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137)
References
#5137 - [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU
Author
tlrmchlsmth
Parents
a360ff80
Files
7
csrc/quantization/cutlass_w8a8
broadcast_load_epilogue_c2x.hpp
broadcast_load_epilogue_c3x.hpp
scaled_mm_dq_c2x.cu
scaled_mm_dq_c3x.cu
pyproject.toml
tests/kernels
test_cutlass.py
vllm/model_executor/layers/quantization/compressed_tensors/schemes
compressed_tensors_w8a8_statictensor.py
Loading