transformers
9264fc91 - Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled (#25394)

Commit

2 years ago

Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled (#25394) * Inconsistency in PreTrainedModel.resize_token_embeddings This PR addresses https://github.com/huggingface/transformers/issues/25241. In previous implementation when ZeRO stage 3 was enbaled, resize_token_embeddings would create independent PyTorch weights on each device. Here we ensure that new embeddings are created with DeepSpeed init, and are properly partitioned accros devices. * formatting with black * adding the removed comments back in --------- Co-authored-by: Sina Moeini <smoeini@amazon.com>

References

#25394 - Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled

#27720 - Add common processor tests

#29969 - [SigLIP] Add fast tokenizer

#32831 - [Docs] Update resources

#33111 - [Backbone] Remove out_features everywhere

#33174 - [Zero-shot image classification pipeline] Remove tokenizer_kwargs

#39821 - Support MetaCLIP 2

#59 - Fix attention mask handling in EoMT-DINOv3 converter

#62 - Add initial DEIMv2 model implementation

#65 - Fix RTDetrV2 sine position embedding ordering

#43729 - [Doc tests] Fix bug

Author

sinamoeini

Parents

b4d55488

transformers 9264fc91 - Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled (#25394)

transformers
9264fc91 - Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled (#25394)