Fixing quantize_per_tensor on cuda (#57703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57703
The .bzl files didn't have registerQuantizedCUDA listed for some reason but upon adding them, the previously broken commands (on CUDA) now work.
note: these build files didn't affect OSS builds which was working throughout.
the test_qtensor test was potentially misleading since it would pass even if CUDA support wasn't working as long as the build system wasn't CUDA enabled. I broke this out into independent tests for each device so at least a skip would be produced rather than a pass for systems without CUDA enabled.
Test Plan:
buck test mode/dbg //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_qtensor_cpu (quantization.test_quantized_tensor.TestQuantizedTensor)'
buck test mode/dbg //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_qtensor_cuda (quantization.test_quantized_tensor.TestQuantizedTensor)'
Reviewed By: jerryzh168
Differential Revision: D28242797
fbshipit-source-id: 938ae86dcd605aedf26ac0bace9db77deaaf9c0f