NHWC Resize optimization (#11825)
The optimization consists of:
* Use int32_t instead of int64_t
* Use different code path for tf_crop_and_resize or other
coordinate_transformation_mode to avoid redundant conditions
* Loop-invariant code motion of offset, coefficient and extrapolation_value
check
* Use fixed point to avoid floating-point computation
Besides, it always transforms NCHW Resize to NHWC because it has higher perf in
the NHWC variant when the input X is 4D int8/uint8 tensor and the mode is
linear on ARM.
It improves DeepLab V3 with int8 quantization by 26%~27% on big core and 37% on
LITTLE core on AArch64. It also improves DeepLab V3 with uint8 quantization by
24%~25% on big core and 34% on LITTLE core on AArch64.
Co-authored-by: Yufeng Li liyufeng1987@gmail.com