[trainer/deepspeed] load_best_model (reimplement re-init) (#17151)
* [trainer/deepspeed] load_best_model
* to sync with DS PR #1947
* simplify
* rework load_best_model test
* cleanup
* bump deepspeed>=0.6.5
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>