Fix a GroupNorm cuda bug when input does not require_grad (#44863)
Summary:
Fix https://discuss.pytorch.org/t/illegal-memory-access-when-i-use-groupnorm/95800
`dX` is a Tensor, comparing `dX` with `nullptr` was wrong.
cc BIT-silence who wrote the kernel.
The test couldn't pass with `rtol=0` and `x.requires_grad=True`, so I have to update that to `1e-5`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44863
Reviewed By: mruberry
Differential Revision: D23754101
Pulled By: BIT-silence
fbshipit-source-id: 2eb0134dd489480e5ae7113a7d7b84629104cd49