preserve residual in transformer norm_first (#61692)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61692
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D29706830
Pulled By: bhosmer
fbshipit-source-id: d9c9e88fb589d46189955a96909c6ca76d587f72