Add INT16 and UINT16 compatibility for relu_quantizelinear (#20187)
### Description
<!-- Describe your changes. -->
There is a problem in relu_quantizelinear transformer that causes wrong
results. The purpose of this PR is to solve this problem.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This does not take into account the situation where Q's zeropoint is
tensor(int16), tensor(uint16), so when this happens, an error will
occur.
How to verify:
```python
import onnx
import onnxruntime as ort
import numpy as np
model_name = 'relu_quantize_testcase.onnx'
model = onnx.load(model_name)
ort_input0 = np.random.rand((1, 64, 64, 128),np.float32)
# infer with GraphOptimizationLevel=0
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
ort_session = ort.InferenceSession(
model_name,
providers=["CPUExecutionProvider"],
sess_options=so
)
outputs = [x.name for x in ort_session.get_outputs()]
ort_outs_mod = ort_session.run(outputs, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} )
del ort_session
# infer with GraphOptimizationLevel=default
model_orig = onnx.load(model_name)
ort_session_orig = ort.InferenceSession(model_orig.SerializeToString())
outputs_orig = [x.name for x in ort_session_orig.get_outputs()]
ort_outs_orig = ort_session_orig.run(outputs_orig, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} )
# diff
print(np.linalg.norm(ort_outs_mod[0].astype(np.float32) - ort_outs_orig[0].astype(np.float32)))
del ort_session_orig
```
[relu_quantize_testcase.zip](https://github.com/microsoft/onnxruntime/files/14848160/relu_quantize_testcase.zip)
---------
Co-authored-by: genmingz <genming.zhong@amd.com>