Ensure tensors are contiguous in functional all_gather.
We called `tensor.contiguous()` in the forward pass, however this was
after the `out_tensor_list` was built which results in the `out_tensor_list`
containing non-contiguous tensors resulting in errors.
Fixing this by moving the contiguous call above.
Differential Revision: [D37222870](https://our.internmc.facebook.com/intern/diff/D37222870/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79747
Approved by: https://github.com/fduwjj, https://github.com/wanchaol