enable channels last for reflection padding on CPU (#102518)
Add channels last support for reflection padding on CPU. The following test cases will pass with this patch:
```
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32
```
The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.
### single core inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.356 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 86.821 ms
(after)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.328 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 16.806 ms
```
### single socket inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.142 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 7.367 ms
(after)
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NHWC: 0.027 ms
ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102518
Approved by: https://github.com/CaoE, https://github.com/cpuhrsch