relax tol for flaky nansum_out_dtype_cuda_float32 test (#121550)
TestReductionsCUDA.test_nansum_out_dtype_cuda_float32 would fail or pass depending on the random inputs. Observed by ROCm internal QA testing. But same problematic random inputs breaks the test for CUDA, verified on V100.
There is precedent in another test within the same file to relax tolerance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121550
Approved by: https://github.com/albanD