Fake Quantization support for f16 and f32 (#52612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52612
used the float type macro to generalize the fake_quantization per tensor functions to f16 and f64.
Test Plan:
added test to show it works in AMP and extended the forward and backward tests below to test float16 and float64 operations. Note: the reference function doesn't work with with these types so I had to convert in and back out of these types to compare.
```test python test/test_quantization.py
TestFakeQuantize.test_forward_backward_per_tensor_with_amp
test python test/test_quantization.py TestFakeQuantize.test_forward_per_tensor_cachemask_cpu
test python test/test_quantization.py TestFakeQuantize.test_backwards_per_tensor_cachemask_cpu
test python test/test_quantization.py TestFakeQuantize.test_forward_per_tensor_cachemask_cuda
test python test/test_quantization.py TestFakeQuantize.test_backwards_per_tensor_cachemask_cuda
test python test/test_quantization.py
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D26586416
fbshipit-source-id: 55fe83c5e47f45cd1de8ddd69bd4a5653ab6dc12