adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference (#4450)
* kernels added for asym fine-grained block quantization with 8bits
* formatting
* clean up the code
* rename quantize_int4.cu to quantize_intX.cu
* rename test_int4_quantization.py to test_intX_quantization.py
* "rename test_int4_quantization.py to test_intX_quantization.py"
This reverts commit 2d341405b2ed6cf69e83fcabad1804513ea92122.
* rename
* fix after the pr comments
* increased coverage of QuantLinear test
(w/ and w/o the cuda kernels)
* formatting
---------
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>