Allow disabling bias for Transformer (#101687)
As used by T5 and PaLM, citing "increased training stability for large models" (https://arxiv.org/abs/2204.02311).
Depends on #101683, which allows disabling bias for `LayerNorm`s. Marked as draft due to this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101687
Approved by: https://github.com/mikaylagawarecki