Add XLA backend for torch.distributed (#3339) (#3378)
* Add XLA backend for torch.distributed
* Add XLA backend multiprocessing tests.
* Linter fixes.
* Address Jack's comments.
* Fix multiprocessing tests: forgot to import the backend.
* Addressing Shen and Jack's comments.
* Fix a search/replace error.
* Fix typo in test_mp_all_gather_xla_backend to make it real.
* Use new reduce_scatter output param and use tensor.copy_ for all_gather result tensor to avoid graph execution.
* Lint fix.
* Fix TODO(alias).
* Add XRT_WORKERS and XRT_DEVICE_MAP setting back to the unit test as we do not aim to exercise GPU spicific code in the unit test.
* Lint fix.
* Skip XLA backend unit tests for GPU/TPU.
* Address Jack's comments.
* rename tests according to Jack's comment.
Co-authored-by: Junmin Hao <86754787+hjm-aws@users.noreply.github.com>