Only call into reducer if torch.is_grad_enabled() (#19897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19897
During validation, gradient reduction is not needed, and autograd is
never called. The model output will always be a detached tensor. After
the new reducer was merged, this meant that it would find all model
parameters unused, and kick off reduction for them. When #19799 and
output where no parameters are used and it tries to kick off reduction
of zeroed gradients. Test for `torch.is_grad_enabled()` and
`self.training` before calling into the reducer.
Reviewed By: mrshenli
Differential Revision: D15118726
fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924