SemanticDiff

pytorch
133e9f96 - Use c10 threadpool for GPU to CPU distributed autograd continuations. (#42511)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

Use c10 threadpool for GPU to CPU distributed autograd continuations. (#42511) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42511 DistEngine currently only has a single thread to execute GPU to CPU continuations as part of the backward pass. This would be a significant performance bottleneck in cases where we have such continuations and would like to execute these using all CPU cores. To alleviate this in this PR, we have the single thread in DistEngine only dequeue work from the global queue, but then hand off execution of that work to the c10 threadpool where we call "execute_graph_task_until_ready_queue_empty". For more context please see: https://github.com/pytorch/pytorch/issues/40255#issuecomment-663298062. ghstack-source-id: 109997718 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D22917579 fbshipit-source-id: c634b6c97f3051f071fd7b994333e6ecb8c54155

Author

pritamdamania

pritamdamania

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading