[bazel] Move torch/csrc/distributed/c10d/quantization/quantization_gpu.cu (#98188)
Fixes #79236
Avoid kernel de-registration problems in bazel by virtue of having a single cuda kernel lib.
Test plan: cherry-picked on a branch where we run all GPU tests and verified that this fixes majority of the tests.
https://github.com/pytorch/pytorch/actions/runs/4593347787/jobs/8111184857?pr=96202
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98188
Approved by: https://github.com/malfet