transformers
9bb51b31 - Share embedding modules in BART, not only weights (#41821)

Commit

126 days ago

Share embedding modules in BART, not only weights (#41821) * Share embedding modules in BART, not only weights Embedding modules are now shared between encoder, decoder and shared - it is the same module, like in the T5 implementation. This has the benefit that it does not matter which module is returned in `get_input_embeddings`, the caller of the latter can be sure that modifications done to that (e.g., hooks) apply to the embeddings. Background: While revamping the gradient checkpointing tests in PEFT via peft#2860 we found that the gradient enable step (`modeling_utils.enable_input_require_grads`) does not work for BART. This leads to gradient checkpointing with `use_reentrant=True` to fail as it will not detect any gradients. The reason for this is that the returned value by `get_input_embeddings` (`self.shared`) is not the module that is called in the encoder, therefore any hooks added to `self.shared` are not run - in this case the hook set by `enable_input_require_grads`. Since the background is a missing hook I've added a test that tests directly the ability to define hooks and their ability to being called. * Add explanatory comment * Don't initialize embeddings when not neccessary * make fix-copies --------- Co-authored-by: nemo <git@ningu.net>

References

#41821 - Share embedding modules in BART, not only weights

Author

githubnemo

Parents

090a8946

transformers 9bb51b31 - Share embedding modules in BART, not only weights (#41821)

transformers
9bb51b31 - Share embedding modules in BART, not only weights (#41821)