[PG NCCL] catch cuda lib runtime error - driver shutting down (#74258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74258
There is a case when PG cleanup thread checks cuda event status after cuda runtime library has been unloaded. When that happens, it would leads to a "driver shutting down" error. This issue usually happens when cuda API is called in global or static object destructor.
Test Plan: wait for user
Reviewed By: jiayisuse, osalpekar
Differential Revision: D34904896
fbshipit-source-id: 705c0812132dae97ea55fcb22730557880ca35e1
(cherry picked from commit ecb5f14a022319402c509b86209f6205212956b7)