Fixed the issue that universal checkpoint cannot be loaded for stage3 when world size expansion. (#7599)
When the world size expands from 2 to 4, then convert to universal
checkpoint, and load from universal checkpoint.
The new rank, for example, rank3 will load model file
`zero_pp_rank_3_mp_rank_00_model_states.pt`. But this file was not
produced during the last execution.
For stage3, just load the first file, that is
`zero_pp_rank_0_mp_rank_00_model_states`.
The existing unit test
TestZeROUniversalCheckpointDP::test_dp_world_size_2to4 can verify this
problem.
---------
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>