Disable reduced precision reductions for fp16 GEMMs (#67578)
Summary:
It appears that most NVIDIA architectures (well, at least there haven't been many reports of this issue) don't do reduced precision reductions (e.g., reducing in fp16 given fp16 inputs), but this change attempts to ensure that a reduced precision reduction is never done. The included test case currently fails on Volta but passes on Pascal and Ampere; setting this flag causes the test to pass on all three.
CC stas00 ngimel ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67578
Reviewed By: mruberry
Differential Revision: D32046030
Pulled By: ngimel
fbshipit-source-id: ac9aa8489ad6835f34bd0300c5d6f4ea76f333d1