[shard] use gather_object in gather API (#71624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71624
Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead.
TODO: To further reduce the comm overhead, we need to figure out a way to avoid using `gather_object`, as `gather_object` or `all_gather_object` incurs pickling copy between devices.
ghstack-source-id: 151007578
Test Plan: wait for ci
Reviewed By: pritamdamania87
Differential Revision: D33688907
fbshipit-source-id: 2073c5a46c33a7a2640a9e3599dc795d9e4c0a1e
(cherry picked from commit dbc983afb76adbe5676768fb365626a313554739)