peft
708d69d2 - Deal with weight tying in transformers >=5 (#2922)

Commit

71 days ago

Deal with weight tying in transformers >=5 (#2922) While we already implemented forward compatibility with the way transformers>=5 handles weight tying, there was an issue with weight tying of trainable tokens wrappers. Before, we simply got fixed strings of which modules are tied to the embeddings, e.g. `"lm_head"` - this never changed since it was just a static property of the respective PretrainedModel class. However, with the new way `get_tied_weights_keys` is implemented, the names of the tied-to-embeddings modules change if they are moved around. So if we wrap the `lm_head` once in a trainable tokens wrapper, it'll become `lm_head.token_adapter.base_layer` instead of `lm_head`. That means that the check to see if we already wrapped the tied layer needs to look at the grand-parent instead of the target layer. This obviously assumes that we always have a nesting level of two which is true for TrainableTokensWrapper.

References

#2922 - Deal with weight tying in transformers >=5

Author

githubnemo

Parents

2d128f1d

peft 708d69d2 - Deal with weight tying in transformers >=5 (#2922)

peft
708d69d2 - Deal with weight tying in transformers >=5 (#2922)