Fix gradient checkpointing + fp16 autocast for most models (#24247)
* fix gc bug
* continue PoC on OPT
* fixes
* :exploding_head:
* fix tests
* remove pytest.mark
* fixup
* forward contrib credits from discussions
* forward contrib credits from discussions
* reverting changes on untouched files.
---------
Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com>
Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>