Fix init-shutdown race condition in autograd engine (#39194)
Summary:
If Engine is created shortly before application exits, then non-reentrant thread might not have a chance to spawn which would result in an infinite wait in `Engine::~Engine()`
Prevent this by actually waiting for threads to spawn before returning from `Engine::start_device_threads()`
Make sure that thread count is incremented before GIL is acquired in PythonThread
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39194
Differential Revision: D21789219
Pulled By: malfet
fbshipit-source-id: d9b5e74d5ddeb2474b575af2e4f33d022efcfe53