pytorch
e7985e3c - Properly initialize `grad_weight` in `raw_cudnn_convolution_backward_weight_out` (#72157)

Commit

2 years ago

Properly initialize `grad_weight` in `raw_cudnn_convolution_backward_weight_out` (#72157) Summary: https://github.com/pytorch/pytorch/issues/71521 attempted to fix an issue where the `test_conv_large` test was producing `NaN` values after the backward pass, yielding a bogus comparison between the result and the expected result. While tweaking the initialization of the conv layer seemed to fix this behavior, it was actually just masking the real issue, which was that `grad_weight` is not guaranteed to be initialized in `raw_cudnn_convolution_backward_weight_out` when the backward operation is split. Specifically, the `grad_weight` tensor is expected to be directly written to by a `cudnn` kernel (which does occur in most cases) so it does not need to be initialized, but splitting introduces an intermediate `grad_weight_` tensor that holds the intermediate gradients and then accumulates into `grad_weight` without initializing it first. This PR tweaks this behavior so that now accumulation is done with a zero'd tensor, and also adds the change of doing the accumulation in an accumulation dtype. The hacky workaround masking the issue is also reverted, with the safeguard against comparing `NaN` values (using the reference tensor for scale computation) kept in place. CC ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/72157 Reviewed By: malfet Differential Revision: D34147547 Pulled By: ngimel fbshipit-source-id: 056c19f727eeef96347db557528272e24eae4223 (cherry picked from commit 24c7f77a81c6ef5b0371ef0030e7003dcce55236)

References

#72894 - Merge pytorch master into lazy_tensor_staging

Author

eqy

Committer

pytorchmergebot

Parents

7d542a4f

pytorch e7985e3c - Properly initialize `grad_weight` in `raw_cudnn_convolution_backward_weight_out` (#72157)

pytorch
e7985e3c - Properly initialize `grad_weight` in `raw_cudnn_convolution_backward_weight_out` (#72157)