[release 2.0.1] [fix] fix load_sharded_optimizer_state_dict error on multi node (#99103)
* fix load_sharded_optimizer_state_dict error on multi node (#98063)
Fixes #95892
This PR fixes the placement error in ChunkShardingSpec when training with multi nodes. 'rank:{global_rank}/cuda:{local_rank}' should be used but 'rank:{global_rank}/cuda:{global_rank}' is used so this would result in a CUDA error: invalid device ordinal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98063
Approved by: https://github.com/kumpera
* Update optimizer.py
Fix the cherry-pick by removing the code formatting change from the original PR.
---------
Co-authored-by: Iris <wz337@cornell.edu>
Co-authored-by: Rodrigo Kumpera <kumpera@users.noreply.github.com>