CUDA implementations (#83810)

Commit

3 years ago

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83810 Approved by: https://github.com/kwen2501

Author

H-Huang

Committer

pytorchmergebot

Parents

0e256c25

pytorch 06e0583f - [4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810)

pytorch
06e0583f - [4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810)