[C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices (#69298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69298
I was exploring adding an invariant that we actually use properly-tracked pinned memory when doing non-blocking copies (to plug various correctness holes), and found this case where we allocate a tensor without pinned memory and then copy it with non_blocking=True.
Test Plan: Unit tests cover this code.
Reviewed By: rohan-varma
Differential Revision: D32786909
fbshipit-source-id: a53f96f57e6727238e4cd2164c1a0f04cf270413