pytorch
97f29bda - Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593)

Commit
3 years ago
Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593) Summary: This test is narrowly failing intermittently. See https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/7736//console for an example. Relevant snippet: ``` 12:28:43 ====================================================================== 12:28:43 FAIL [0.104s]: test_noncontiguous_samples_matmul_cuda_float32 (__main__.TestCommonCUDA) 12:28:43 ---------------------------------------------------------------------- 12:28:43 Traceback (most recent call last): 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(*args, **kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(*args, **kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test 12:28:43 result = test(self, **param_kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper 12:28:43 return test(*args, **kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 920, in only_fn 12:28:43 return fn(self, *args, **kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1041, in wrapper 12:28:43 fn(*args, **kwargs) 12:28:43 File "test_ops.py", line 262, in test_noncontiguous_samples 12:28:43 self.assertEqual(actual_grad, expected_grad) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual 12:28:43 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) 12:28:43 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 1 element(s) (out of 10) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.2278556823730469e-05 (-1.458460807800293 vs. -1.4584730863571167), which occurred at index 7. ``` Setting an absolute tolerance of 1e-4, which is what this PR does, should make the test pass consistently. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67593 Reviewed By: ngimel Differential Revision: D32050986 Pulled By: mruberry fbshipit-source-id: f15bc8c4516be0a859afcfa76d52334c0b2c58a5
Author
Mike Ruberry
Parents
Loading