Add flag to optionally average output attention weights across heads (#70055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47583
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055
Reviewed By: bhosmer
Differential Revision: D33457866
Pulled By: jbschlosser
fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5