Fix a bug where nn.functional._AllGather.backward produces wrong gradients (#120582)
Summary:
Fixes #120386
`_AllGather.backward` assumes that `_ReduceScatter` would always in-place update the output buffer. However, when the output buffer is non-contiguous, `_ReduceScatter` would allocate and return a different buffer, causing the gradient to be thrown away.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120582
Approved by: https://github.com/XilunWu