transformers
3059d80d
- [DeepSpeed ZeRO3] Fix performance degradation in sharded models (#18911)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
3 years ago
[DeepSpeed ZeRO3] Fix performance degradation in sharded models (#18911) * [DeepSpeed] Fix performance degradation in sharded models * style * polish Co-authored-by: Stas Bekman <stas@stason.org>
References
#18911 - [DeepSpeed ZeRO3] Fix performance degradation in sharded models
#19449 - [WIP] Fix weights initialization of several vision models
#27720 - Add common processor tests
#29969 - [SigLIP] Add fast tokenizer
#32831 - [Docs] Update resources
#33111 - [Backbone] Remove out_features everywhere
#33174 - [Zero-shot image classification pipeline] Remove tokenizer_kwargs
#39821 - Support MetaCLIP 2
#59 - Fix attention mask handling in EoMT-DINOv3 converter
#62 - Add initial DEIMv2 model implementation
Author
tjruwase
Parents
10c774cf
Loading