Cuda quantized tensors, support for quantize per channel (#58245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58245
This adds the support for the per_channel quantization,
(Note: this ignores all push blocking failures!)
Test Plan:
python test/test_quantization.py TestQuantizedTensors
python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics
python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D29018271
fbshipit-source-id: 4f59aed98f2f8ff607154250e4e3f85592e17854