[dtensor] fix allgather unpadding logic (#103219)
This PR fixes allgather unpadding logic so that we only need to unpad
the full tensor instead of first chunking it to small tensors and unpad
individually, as we know how our padding algorithm works
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103219
Approved by: https://github.com/wz337, https://github.com/fduwjj