optimize upsample performance linear mode on CPU (#34864)
Summary:
This pr aims at improving `nn.UpSample()` performance on CPU with mode `linear`, `bilinear`, `trilinear`.
For single socket inference, up to **31x** performance improvement.
For single core inference, up to **1.8x** performance improvement.
For dual socket training, up to **28x** performance improvement.
`channel last` format kernel also provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34864
Differential Revision: D20772990
Pulled By: ngimel
fbshipit-source-id: a48307f2072227f20e742ebbd4a093bb29537d19