Create a quantized in-palce version CUDA ReLU function, relu_quantized_cuda_. (#85670)
Summary:
this and #85669 are to allow the relu function to run on a quantized tensor on cuda. That is torch.relu(qa) for a quantized tensor qa on cuda.
Test Plan:
python test/test_quantization.py
Previous PR that has been reverted: #85502.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85670
Approved by: https://github.com/dzdang, https://github.com/z-a-f