Implement autograd functions for c10d communication operations (#40762)
Summary:
Closes https://github.com/pytorch/pytorch/issues/40702, Fixes https://github.com/pytorch/pytorch/issues/40690
Currently wip. But I would appreciate some feedback. Functions should be double-differentiable.
Contrary to https://github.com/pytorch/pytorch/blob/b35cdc5200af963e410c0a25400fd07f30b89bca/torch/nn/parallel/_functions.py
This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct?
Thanks!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40762
Reviewed By: glaringlee
Differential Revision: D24758889
Pulled By: mrshenli
fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce