Attempt to fix VLM gradient enabling (#41993)
* attempt to fix gradients
* Improve tests, use PreTrainedModel hooks, cleanup
* missing patch_embed
* fix arg name
* local revert
* adapt BART test
* lingering fails
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>