Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981
This is a recommit of previously reverted D20850851 (https://github.com/pytorch/pytorch/commit/3fbddb92b1be1f70edced886745116b8daeebb17).
TL;DR - combining condition_variables and atomics is a bad idea
https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock
This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them.
Test Plan:
buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120
buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100
buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100
no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/
will ensure no timeouts in OSS
Reviewed By: walterddr, dahsh
Differential Revision: D24165505
fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196