pytorch
07496423 - Add reporting for flaky tests in CI (#68150)

Commit View On GitHub

Commit

2 years ago

Add reporting for flaky tests in CI (#68150) Summary: This PR does NOT change how signal is displayed in CI but rather just reports stats of flaky tests to RDS. **None of the below will be enabled after landing this PR--it will be done in a separate PR with environment variables.** We report flaky tests stats when a test first fails, and when we rerun it MAX_NUM_RETRIES times, we get at least one success. For tests that fail all the reruns, we assume it is because it is a real test failure. For tests that succeed the first time, we do not rerun the test, even if it was previously noted as flaky. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68150 Test Plan: First, I modified: test_async_python to always fail (will be our "failing test") test_async_future_type_python to fail 40% of the time test_async_script_capture to fail 60% of the time Then, running `python test/test_jit.py -v -k test_async` while setting IN_CI to 1: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.004s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.020s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.045s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 3 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 2 test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_capture succeeded - num_retries_left: 1 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 0 test_async_script_error (jit.test_async.TestAsync) ... ok (0.040s) test_async_script_multi_forks (jit.test_async.TestAsync) ... ok (0.025s) test_async_script_multi_waits (jit.test_async.TestAsync) ... ok (0.009s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ====================================================================== FAIL [0.010s]: test_async_script_capture (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 123, in test_async_script_capture self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 28 tests in 0.399s FAILED (failures=2, expected failures=5, unexpected successes=1) ``` Yielding this as the test report (I changed the extension from xml to txt so it uploads here): [TEST-jit.test_async.TestAsync-20211110222055.txt](https://github.com/pytorch/pytorch/files/7517532/TEST-jit.test_async.TestAsync-20211110222055.txt) And then running print_test_stats correctly excludes the all failing test `test_async_python` and calculates red and green appropriately: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_script_capture', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 3, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] (pytorch) janeyx@janeyx-mbp pytorch % ``` ------------------- If you're curious, I also included the code for when we would like to override the report_only feature and also hide flaky signal in CI. The results for the same test command correctly still fail the test suite, but mark the flaky test_async_future_type_python as passed: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_future_type_python failed - num_retries_left: 3 test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.001s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.017s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.091s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_error (jit.test_async.TestAsync) ... ok (0.039s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 26 tests in 0.390s FAILED (failures=1, expected failures=4) ``` With test reports: [TEST-jit.test_async.TestAsync-20211110224810.txt](https://github.com/pytorch/pytorch/files/7517663/TEST-jit.test_async.TestAsync-20211110224810.txt) And running print_test_stats: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_future_type_python', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 1, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] ``` Reviewed By: saketh-are Differential Revision: D32393907 Pulled By: janeyx99 fbshipit-source-id: 37df890481ab84c62809c022dc6338b50972899c

References

#68353 - Merge from master

Author

janeyx99

Committer

desertfire

Parents

46475f8c

pytorch 07496423 - Add reporting for flaky tests in CI (#68150)

Commit

pytorch
07496423 - Add reporting for flaky tests in CI (#68150)