Fix non-deterministic Megatron-LM checkpoint name (#24674)
Fix non-deterministic checkpoint name
`os.listdir`'s order is not deterministic, which is a problem when
querying the first listed file as in the code (`os.listdir(...)[0]`).
This can return a checkpoint name such as `distrib_optim.pt`, which does
not include desired information such as the saved arguments originally
given to Megatron-LM.