[FSDP2] Used `split_with_sizes_copy` for all-gather copy-out (#119451)
This switches to using @yifuwang's `split_with_sizes_copy.out` fast path!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119451
Approved by: https://github.com/yifuwang
ghstack dependencies: #118017, #118118