Port legacy checkpoint API into new front-end (#4855)
* Port legacy checkpoint API into new front-end
This PR also fixes:
* Warnings on ORTTrainer for improper tensor copies
* Inaccurate LRScheduler tests using wrong LR
* Stale DeepSpeed documentation
* Minor code refactoring for Toy BERT tests
* Move experimental state_dict() and load_state_dict() into checkpoint ns