Fixup dataloader state dict bugs + incorporate load/save_state API (#3034)
* v1
* More testing, need to try on H100
* Bigger batch for h100 test
* test tweak
* Fixup all tests!
* Bookmark
* Fix issues, working now
* rm num samples
* Uncomment
* Give stateful dl end of dl
* Make skip DL stateful
* Migrate to update_state_dict
* try/finally
* Add comments to test
* rm comment
* Document
* refactor out for eventual override
* Doc nit
* Brute force it