DeepSpeed
6c86ff39 - adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference (#4450)

Commit
2 years ago
adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference (#4450) * kernels added for asym fine-grained block quantization with 8bits * formatting * clean up the code * rename quantize_int4.cu to quantize_intX.cu * rename test_int4_quantization.py to test_intX_quantization.py * "rename test_int4_quantization.py to test_intX_quantization.py" This reverts commit 2d341405b2ed6cf69e83fcabad1804513ea92122. * rename * fix after the pr comments * increased coverage of QuantLinear test (w/ and w/o the cuda kernels) * formatting --------- Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading