XLA SPMD PoC implementation with XRT (#3476)
* Add experimental XLAShardedTensor and mark_sharding API
* Add ShardingSpec annotation to XLA tensor
* Tensor sharding annotation and sharded HLO dumping function.
* Use sharding custom_call & add more comments
* Disable GPU for SPMD