transformers
3e142cb0 - fix overflow when training mDeberta in fp16 (#24116)

Commit

2 years ago

fix overflow when training mDeberta in fp16 (#24116) * Porting changes from https://github.com/microsoft/DeBERTa/ that hopefully allows for fp16 training of mdeberta * Updates to deberta modeling from microsoft repo * Performing some cleanup * Undoing changes that weren't necessary * Undoing float calls * Minimally change the p2c block * Fix error * Minimally changing the c2p block * Switch to torch sqrt * Remove math * Adding back the to calls to scale * Undoing attention_scores change * Removing commented out code * Updating modeling_sew_d.py to satisfy utils/check_copies.py * Missed changed * Further reduce changes needed to get fp16 working * Reverting changes to modeling_sew_d.py * Make same change in TF

References

#24116 - fix overflow when training mDeberta in fp16

#27720 - Add common processor tests

#29969 - [SigLIP] Add fast tokenizer

#32831 - [Docs] Update resources

#33111 - [Backbone] Remove out_features everywhere

#33174 - [Zero-shot image classification pipeline] Remove tokenizer_kwargs

#39821 - Support MetaCLIP 2

#58 - Add EoMT DINOv3 model

#59 - Fix attention mask handling in EoMT-DINOv3 converter

#41212 - Add EoMT with DINOv3 backbone

#62 - Add initial DEIMv2 model implementation

Author

sjrl

Parents

f91810da

transformers 3e142cb0 - fix overflow when training mDeberta in fp16 (#24116)

transformers
3e142cb0 - fix overflow when training mDeberta in fp16 (#24116)