Create a quantized non-in-palce version CUDA ReLU function, (#85669)
Summary:
this and #85670 are to allow the relu function to run on a quantized tensor on cuda. That is torch.relu(qa) for a quantized tensor qa on cuda.
Test Plan:
python test/test_quantization.py
Previous PR that has been reverted: #85502.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85669
Approved by: https://github.com/dzdang