Resize optimization for all architectures (#11956)
With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor
and the mode is linear by:
* Transforming NCHW Resize to NHWC variant
* Using the NHWC Resize kernel without floating-point computation
It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves
Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.