[ROCM] Enable test_fn_fwgrad_..._functional_binary_cross_entropy on ROCM (#109038)
Fixes #98431
This change addresses a hardware assertion that was triggered on ROCm only tests.
Description of the problem:
Assertion triggered
```
Device-side assertion `target_val >= zero && target_val <= one' failed.
```
The issue in question is due to a GPU side assertion in `binary_cross_entropy_out_cuda` where a `target_val` get's passed to the kernel that does not fall between 0 and 1. The value in question that triggers the assertion is -0.000000000810. The origin of this negative value comes from one of the tensors generated for the test. In this tensor, one of the values (on ROCM) is 0.000000999190 which adhere's to the restriction that it is between 0 and 1. However, this value is eventually passed as a single entry tensor to gradcheck.py::_compute_numerical_gradient
( https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L347)
This function perturbs the tensor value in-place by subtracting `v` from it and then adding it back. The value of `v` comes from the default `eps` value defined here https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L2119
Currently pegged at `1e-6`. So what occurs is when an input is less than the default eps (like 0.000000999190 ), the perturbation calculation causes an entry in the tensor to flip to negative, i.e. 0.000000999190 - 1e-6 = -0.000000000810 (due to the subtraction here: https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L364) which then triggers the device side assertion in `binary_cross_entropy_out_cuda`.
This PR loosens the EPS by an order of magnitude to get around the error. Since this issue has not been caught in the field in any meaningful way, I find this to be an adequate solution, though am happy to hear opposing viewpoints.
Important to mention, while this error was only occurring on ROCm platforms, the issue described is also present in CUDA based environments. The difference being that CUDA doesn't seem to generate a tensor with any values less than `1e-6`. When injecting the small value on an Nvidia box, the same device side assertion was triggered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109038
Approved by: https://github.com/jeffdaily, https://github.com/albanD