Checkpoint reshaping (#1953)
* unit test, remove exception, add notes
* Move param_shapes to model files
* Remove hard-coded constants
* Conditioned to zero optimizer
* Add zero checkpoint merging
* Print checkpoint version
* Reshape zero_* ckpt files
* Merge zero* files contraction
* Utils for 3D contraction reshaping
* Remove bogus import
* Support bf16_zero ckpts
* Add param slice mappings
* Load universal checkpoints
* Per group mappings from Stas
* Hack to load bf16 zero files
* Param attributes
* WIP
* Fix api bug
* Update lp with local/remote hp
* Disable vocab padding handling
* Update z2 checkpoint
* Remove debug prints
* Remove debug prints; Rebase unit test
* Add reshape assert
* Padding
* Typo
* Catch nonexistent checkpoint path
* Cleanup
* Restore checkpoint state comparisons
* Add torch version guards
* More precise avoidance of false positives.
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>