Align AvgPool ceil_mode on last value to torch (#16752)
Fix #16203
Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.
However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.
Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: https://github.com/onnx/onnx/pull/6650 (And
this PR is also validated with the tests added in
https://github.com/onnx/onnx/pull/6650)