Add LayerScale to NAT/DiNAT (#20325)
* Add LayerScale to NAT/DiNAT.
Completely dropped the ball on LayerScale in the original PR (#20219).
This is just an optional argument in both models, and is only activated for larger variants in order to provide training stability.
* Add LayerScale to NAT/DiNAT.
Minor error fixed.
Co-authored-by: Ali Hassani <ahassanijr@gmail.com>