Add reporting for flaky tests in CI (#68150)
Summary:
This PR does NOT change how signal is displayed in CI but rather just reports stats of flaky tests to RDS. **None of the below will be enabled after landing this PR--it will be done in a separate PR with environment variables.**
We report flaky tests stats when a test first fails, and when we rerun it MAX_NUM_RETRIES times, we get at least one success.
For tests that fail all the reruns, we assume it is because it is a real test failure.
For tests that succeed the first time, we do not rerun the test, even if it was previously noted as flaky.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68150
Test Plan:
First, I modified:
test_async_python to always fail (will be our "failing test")
test_async_future_type_python to fail 40% of the time
test_async_script_capture to fail 60% of the time
Then, running `python test/test_jit.py -v -k test_async` while setting IN_CI to 1:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async
...
Running tests...
----------------------------------------------------------------------
test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.004s)
test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.020s)
test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s)
test_async_kwargs (jit.test_async.TestAsync) ... ok (0.045s)
test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s)
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 3
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 2
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 1
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 0
test_async_script (jit.test_async.TestAsync) ... ok (0.008s)
test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
test_async_script_capture failed - num_retries_left: 3
test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
test_async_script_capture failed - num_retries_left: 2
test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s)
test_async_script_capture succeeded - num_retries_left: 1
test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
test_async_script_capture failed - num_retries_left: 0
test_async_script_error (jit.test_async.TestAsync) ... ok (0.040s)
test_async_script_multi_forks (jit.test_async.TestAsync) ... ok (0.025s)
test_async_script_multi_waits (jit.test_async.TestAsync) ... ok (0.009s)
...
======================================================================
FAIL [0.003s]: test_async_python (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python
self.assertTrue(False)
AssertionError: False is not true
======================================================================
FAIL [0.010s]: test_async_script_capture (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/janeyx/pytorch/test/jit/test_async.py", line 123, in test_async_script_capture
self.assertTrue(False)
AssertionError: False is not true
----------------------------------------------------------------------
Ran 28 tests in 0.399s
FAILED (failures=2, expected failures=5, unexpected successes=1)
```
Yielding this as the test report (I changed the extension from xml to txt so it uploads here):
[TEST-jit.test_async.TestAsync-20211110222055.txt](https://github.com/pytorch/pytorch/files/7517532/TEST-jit.test_async.TestAsync-20211110222055.txt)
And then running print_test_stats correctly excludes the all failing test `test_async_python` and calculates red and green appropriately:
```
(pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}]
[scribe] Writing for None
[scribe] Wrote stats for flaky_tests
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_script_capture', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 3, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}]
(pytorch) janeyx@janeyx-mbp pytorch %
```
-------------------
If you're curious, I also included the code for when we would like to override the report_only feature and also hide flaky signal in CI. The results for the same test command correctly still fail the test suite, but mark the flaky test_async_future_type_python as passed:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async
...
Running tests...
----------------------------------------------------------------------
test_async_future_type_python (jit.test_async.TestAsync) ... FAIL (0.004s)
test_async_future_type_python failed - num_retries_left: 3
test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.001s)
test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.017s)
test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s)
test_async_kwargs (jit.test_async.TestAsync) ... ok (0.091s)
test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s)
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 3
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 2
test_async_python (jit.test_async.TestAsync) ... FAIL (0.004s)
test_async_python failed - num_retries_left: 1
test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
test_async_python failed - num_retries_left: 0
test_async_script (jit.test_async.TestAsync) ... ok (0.008s)
test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s)
test_async_script_error (jit.test_async.TestAsync) ... ok (0.039s)
...
======================================================================
FAIL [0.003s]: test_async_python (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python
self.assertTrue(False)
AssertionError: False is not true
----------------------------------------------------------------------
Ran 26 tests in 0.390s
FAILED (failures=1, expected failures=4)
```
With test reports:
[TEST-jit.test_async.TestAsync-20211110224810.txt](https://github.com/pytorch/pytorch/files/7517663/TEST-jit.test_async.TestAsync-20211110224810.txt)
And running print_test_stats:
```
(pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}]
[scribe] Writing for None
[scribe] Wrote stats for flaky_tests
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_future_type_python', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 1, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}]
```
Reviewed By: saketh-are
Differential Revision: D32393907
Pulled By: janeyx99
fbshipit-source-id: 37df890481ab84c62809c022dc6338b50972899c