[ZeroRedundancyOptimizer] Elastic and pytorch compatible checkpoints (#50956)
Summary:
- Makes it possible to use non-sharded optimizer checkpoints (as long as the model/param groups are the same, of course)
- Makes it possible to save with a given world size, and load with another world size
- Use Torch Distributed built-in broadcast object list instead of a ad-hoc version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50956
Reviewed By: malfet
Differential Revision: D26113953
Pulled By: blefaudeux
fbshipit-source-id: 030bfeee2c34c2d987590d45dc8efe05515f2e5c