Switch autograd to use a pool of workers for each device (#21911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911
ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f
Test Plan:
Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error.
Performance testing:
[word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change)
[mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change)
[Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected)
Imported from OSS
Differential Revision: D15985852
fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c