pytorch
38ebb776 - Fail with unexpected success for fatal errors (#72016)

Commit
2 years ago
Fail with unexpected success for fatal errors (#72016) Summary: Rest of the tests from CUDA testuite is skipped after GPU context corruption is encountered. For tests decorated with `expectedFailure` creates false impression that entire testsuite is passing. Remedy it by suppressing the exception and printing the warning about unexpected success if `should_stop_early` is true Also, prints warning when this happens (to make attribution easier) as well as when this condition is detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72016 Test Plan: `python test_ops.py -v -k test_fn_fwgrad_bwgrad_gradient` Before the change: ``` test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... expected failure ---------------------------------------------------------------------- Ran 3 tests in 0.585s OK (expected failures=1) ``` After the change: ``` test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}") /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error warn("Suppressed expected failure that resulted in fatal error") unexpected success ---------------------------------------------------------------------- Ran 3 tests in 0.595s FAILED (unexpected successes=1) ``` And `stderr` from XML file contains requested info: ``` /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}") /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error warn("Suppressed expected failure that resulted in fatal error") ``` Fixes https://github.com/pytorch/pytorch/issues/71973 Reviewed By: janeyx99, ngimel Differential Revision: D33854287 Pulled By: malfet fbshipit-source-id: dd0f5a4d2fcd21ebb7ee50ce4ec4914405a812d0 (cherry picked from commit 0c0baf393158b430e938ff3be3f4b59f85620e35)
Author
Committer
Parents
Loading