Allow forking until a worker thread is created in autograd engine (#72689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72689
Fix https://github.com/pytorch/pytorch/issues/69839
Should we add a private python binding to check if the bad fork guard has been set and add test in CI to make sure that it is never set on our CPU-only CI build? Not sure how flaky that will be out of CI for people that run CPU build on a machine that cuda installed...
EDIT: turns out, we already had such tests in test_multiprocessing. So should be tested and enforced now!
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34180243
Pulled By: albanD
fbshipit-source-id: 3284db52dcf4568362244b60e3c5657153e64fa4
(cherry picked from commit 6e23f7a33a065c2ab6a267b2c7f0ca97c24532ea)