optimize UpSampleNearest 1d 2d and 3d performance on CPU (#31452)
Summary:
This PR aims at improving `UpSample` performance with `mode='nearest'` on 1D 2D and 3D, both inference and training are covered. Current implementation from 'ATen' doesn't have parallelization.
1. single socket inference speedup for 1d, 2d and 3d: **63x, 57x, 46x**.
2. single core inference speedup for 1d, 2d and 3d: **5.9x, 4.6x, 3.4x**.
3. dual sockets training speedup for 1d, 2d and 3d: **38x, 33x, 65x**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31452
Differential Revision: D20077828
Pulled By: VitalyFedyunin
fbshipit-source-id: a7815cf2ae344696067d2ec63bd4f4e858eaafff