transformers
Fix bug in gpt2's (from-scratch) special scaled weight initialization
#17877
Merged

Loading