Support loading and saving ZeRO checkpoints with changing DP degree (#240)
* Support saving and loading ZeRO checkpoints on different data
parallelism degree.
* Fix formatting
* Support checkpoint with varying GPU count in ZeRO stage 1
* Fix formatting
* Formatting fixes
* Update model tests
* Remove pprint
* Minor fix
* Fix formatting
* Update model tests
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>