pytorch
54c05fa3 - Add basic GPU support to distributed autograd. (#40312)

Commit View On GitHub

Commit

4 years ago

Add basic GPU support to distributed autograd. (#40312) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40312 As part of https://github.com/pytorch/pytorch/issues/40255, we realized that GPU support for distributed autograd was broken as part of our multithreaded autograd change. To fix this in the short term for 1.6, this PR includes the following changes: 1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the autograd graph. 2) The long lived CPU thread has its own ready_queue and this queue is used for all GraphTasks created by DistEngine. 3) In thread_main(), the CPU thread cannot exit once the GraphTask is done processing because of the new CPU thread added in 1). 4) To resolve this, thread_main() now has a parameter `device_thread` instead of `reentrant_thread`. When device_thread is True, we expect this to be a long lived device thread that does not exit. 5) When device_thread is False, thread_main is expected to run a GraphTask and return once done. ghstack-source-id: 106391329 Test Plan: waitforbuildbot Differential Revision: D22146183 fbshipit-source-id: dd146b7a95f55db75f6767889b7255e9d62d5825

Author

pritamdamania

Committer

facebook-github-bot

Parents

e509c58a

pytorch 54c05fa3 - Add basic GPU support to distributed autograd. (#40312)

Commit

pytorch
54c05fa3 - Add basic GPU support to distributed autograd. (#40312)