[quant] PerChannelFloatQParams support for quint4x2 dtype (#45594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45594
Adds support for Per-channel quantization using float qparams for 4-bit dtype
We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the
4-bit data depending on the bit_width.
Size of 4-bit quantized tensor is half that of 8-bit quantized tensor.
Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D24025595
fbshipit-source-id: dd9d0557de585dd4aaf5f138959c3523a29fb759