Patch all_gather to support HSDP + TP (#118638)
Update all_gather to support HSDP + TP.
Currently, the `_all_gather_dtensor` function for dtensors only replaces the first dimension with replicate (the FSDP dimension) and does not touch the second dimension (which is assumed to be the TP dimension). With HSDP, we have two dimensions ahead of the TP dimension as opposed to 1. This PR updates to replace all other dimensions with replicate to run the all-gather.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118638
Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wz337