Make gradient-checkpoint enabling tolerant of models without get_input_embeddings (#42558)
* add embedding getter
* modify your own logic
* a common test
* some adapters are not PreTrainedModel s
* few fixes
* implement correct-ish fix?
* fixup
* this is needed likely
* woops
* solving some cross-imports issues here and there
* more ximports issues
* finally
* revert changes
* fixups
* improve message
* add common tests for input_ids first
* increase test coverage
* bigger update for GC
* copies
* mlcd is getting on my nerves a bit
* ah yes
* for BC
* break a couple modelings
* simplify with base_model
* fix copies for torch checkpointing
* simplify this model
* improve messages