Use int64_t index type in multiplications to avoid integer overflow in max_pool2d and avg_pool2d on CUDA (#68682)
Fix https://github.com/pytorch/pytorch/issues/68418
- [X] operator benchmark: https://github.com/xwang233/code-snippet/tree/master/pooling-bench-68682, 10% or worse regression are seen in some shapes
- [X] end-to-end benchmark: no major regression seen in our test suites
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68682
Approved by: https://github.com/ngimel