Add API to assemble CPU shards to a sharded tensor (#5681)
* Add API to assemble CPU shards to a sharded tensor
* Handle replicated sharding
* Move validations into get_op_sharding
* Improve tests and error handling
* Don't WrapXlaData
* Fix test for v3