Modify PyTorch's integration of NNPACK to use a unified underlying thread pool implementation. (#27341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27341
Multi-threaded:
```
Pixel 2:
Before: 362.716
PR-27402: 185.799
PR-27341: 142.011
Pixel 3:
Before: 246.755
PR-27402: 160.045
PR-27341: 115.437
```
Single-threaded:
```
Pixel 2:
Before: 308.084
PR-27340: 303.539
PR-27341: 313.558
Pixel 3:
Before: 234.272
PR-27340: 227.158
PR-27341: 232.787
```
Test Plan: Imported from OSS
Differential Revision: D17835333
Pulled By: AshkanAliabadi
fbshipit-source-id: 9502c230d8567b141ae93f611ac524d855ed9bdf