Fix max_pool2d perf regression (#41174)
Summary:
The two pointer variables `ptr_top_diff` and `ptr_top_mask` were introduced in https://github.com/pytorch/pytorch/issues/38953. Some end-to-end testing showed training performance regression due to this change. The performance is restored after removing the two pointer variables, and adding offset directly below in the indexing [ ] calculations.
See PR change https://github.com/pytorch/pytorch/pull/38953/files#diff-8085d370f4e98295074a51b8a1f829e9R187-R188
https://github.com/pytorch/pytorch/blob/e4a3c584d51662d4c14060fc8517464fe3c12142/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu#L186-L195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41174
Differential Revision: D22451565
Pulled By: ngimel
fbshipit-source-id: 37ed6b9fd785e1be31a027ef5d60794656cc575a