[PyTorch] MHA: guard epilogue TODOs w/checks & implement 1 (#72461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72461
TODO in the CPU implementation about missing epilogues on vectorized loops. Guard some and implement the one that was failing
ghstack-source-id: 149067336
Test Plan: cosine similarity w/existing impl is unchanged for the CPU implementation (which is surprising; should expect improvement IIUC)
Reviewed By: zrphercule, ngimel
Differential Revision: D33988259
fbshipit-source-id: 72739b7ea210c6e51a76f356a77e49ea00095f49
(cherry picked from commit e1ea8aa405fba4b19d6549bca19a79b5e7841049)