Fix max_pool2d nchw backward bug (#38953)
Summary:
Fix https://github.com/pytorch/pytorch/issues/38764
The current problem is that, `top_diff` and `top_mask` pointers are shifted "accumulatively" with for-n and for-c loops. This may cause overflow and illegal memory access when the loop counts are greater than one, that is n > 65535 or c > 65535 (the case in https://github.com/pytorch/pytorch/issues/38764). Since neither of n > 65535 or c > 65535 is common, it has not been seen before. The simple fix would be using new pointer variables for the n & c offset instead of directly modifying `top_diff` or `top_mask`.
However, I think the current nchw max_pool2d GPU impl still has plenty of room for performance improvement. We can check that in a later PR if needed.
Slightly clean up the indentation. Also add tests to use CPU impl as a reference check.
cc skrah
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38953
Differential Revision: D21721930
Pulled By: ezyang
fbshipit-source-id: fef7d911d814f8ed9fd67c60cabe5d52f8fd3d57