Fix destructor ordering for cuda handle pools (#39345)
Summary:
Possible fix for gh-38385. Unfortunately, I haven't been able to reproduce the issue reliably, so can't say for certain.
Since this appears to be a destruction ordering issue, I've focused on making the destructor calls well-ordered:
- Each pool is now a function-local `static` instead of a global variable. This ensures the destructor happens before any relevant pytorch global state is destroyed.
- Each pool window now only stores a `std::weak_ptr` to the global pool. This means it can't extend the lifetime of the pool outside of the normal destructor ordering. That does also mean that if the `weak_ptr` is invalid, the handles will get leaked. However, that shouldn't happen under normal use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39345
Differential Revision: D22044376
Pulled By: ezyang
fbshipit-source-id: da1713b42c143ed1452a6edf1ecb05cd45743c7a