pytorch
4e26ad78 - fix load_sharded_optimizer_state_dict error on multi node (#98063)

Commit
1 year ago
fix load_sharded_optimizer_state_dict error on multi node (#98063) Fixes #95892 This PR fixes the placement error in ChunkShardingSpec when training with multi nodes. 'rank:{global_rank}/cuda:{local_rank}' should be used but 'rank:{global_rank}/cuda:{global_rank}' is used so this would result in a CUDA error: invalid device ordinal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98063 Approved by: https://github.com/kumpera
Author
Committer
Parents
Loading