[PyTorch] Hit fused addmm path in linear() for existing MHA (#72871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72871
We do this same trick in the native MHA implementation; backport it for purposes of fair comparison.
ghstack-source-id: 149526858
Test Plan: CI
Reviewed By: ngimel
Differential Revision: D34176090
fbshipit-source-id: 8b578c29c4dcf0d85bae74dfbbb82db9a8f32dc7
(cherry picked from commit fd50170935cfe790466c0eb59575ef84003b6ca6)