Experimental TPU implementation of DistributedDataParallel (#4193)
* Add C++ API to get PJRT process ID
* Add PJRT-compatible DDP implementation
* Update ImageNet test for PJRT+DDP
* Use new DDP implementation in tests
* Fix tests
* Fix XRT test
* formatting
* Make process group init optional.
* formatting
* Check for TPU before checking TPU version