Add Distributed Checkpointing support (#3639)
* Change naming of moments to Moment_x_<weight_name>
* Add checkpointing code and zero checkpoint aggregation
* Correct aggregation for LAMB, cleanup
* Add simple checkpointing test
* Add test for zero checkpoint aggregation
* Fix tests
* fix test
* Review changes
* Fix test after review comment fix
* Fix API, test
* Fix test after API change
* Decouple save load from ORTTrainer
* Add flag to not break checkpointing with ORTModel'
Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>