pytorch
ec071a08 - [PG NCCL] catch cuda lib runtime error - driver shutting down (#74258)

Commit
2 years ago
[PG NCCL] catch cuda lib runtime error - driver shutting down (#74258) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74258 There is a case when PG cleanup thread checks cuda event status after cuda runtime library has been unloaded. When that happens, it would leads to a "driver shutting down" error. This issue usually happens when cuda API is called in global or static object destructor. Test Plan: wait for user Reviewed By: jiayisuse, osalpekar Differential Revision: D34904896 fbshipit-source-id: 705c0812132dae97ea55fcb22730557880ca35e1 (cherry picked from commit ecb5f14a022319402c509b86209f6205212956b7)
Author
Committer
Parents
Loading