transformers
11f3ec72 - Add LayerScale to NAT/DiNAT (#20325)

Commit
3 years ago
Add LayerScale to NAT/DiNAT (#20325) * Add LayerScale to NAT/DiNAT. Completely dropped the ball on LayerScale in the original PR (#20219). This is just an optional argument in both models, and is only activated for larger variants in order to provide training stability. * Add LayerScale to NAT/DiNAT. Minor error fixed. Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Author
Parents
Loading