Share embedding modules in BART, not only weights (#41821)
* Share embedding modules in BART, not only weights
Embedding modules are now shared between encoder, decoder
and shared - it is the same module, like in the T5 implementation.
This has the benefit that it does not matter which module is returned
in `get_input_embeddings`, the caller of the latter can be sure that
modifications done to that (e.g., hooks) apply to the embeddings.
Background: While revamping the gradient checkpointing tests in PEFT via
peft#2860 we found that the gradient enable step
(`modeling_utils.enable_input_require_grads`) does not work for BART.
This leads to gradient checkpointing with `use_reentrant=True` to
fail as it will not detect any gradients. The reason for this is that
the returned value by `get_input_embeddings` (`self.shared`) is not
the module that is called in the encoder, therefore any hooks added
to `self.shared` are not run - in this case the hook set by
`enable_input_require_grads`.
Since the background is a missing hook I've added a test that tests
directly the ability to define hooks and their ability to being called.
* Add explanatory comment
* Don't initialize embeddings when not neccessary
* make fix-copies
---------
Co-authored-by: nemo <git@ningu.net>