transformers
860b898d - fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) (#38983)

Commit
187 days ago
fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) (#38983) * fix: astronomical loss with ModernBERT when using gradient checkpointing * update the modling fix --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>
Author
Parents
Loading