Fix checkpoint API and improve loss scaler handling (#4950)
This PR also includes:
* More LossScaler tests
* Minor LossScaler improvement
* Check model after extra post processing
* Improve basic training tests to include all optimizers
* Set rtol=1e-7 tolerance for Legacy vs Experimental frontend API tests
* Increase number of training tests for Legacy vs Experimental tests
* Minor refactoring on existing tests
* Fix Checkpoint API for Gradient Accumulation / fp16 scenarios