Fix MaxPool3d CUDA backward incorrect results for non-square output (#36820)
Summary:
In the CUDA version of max_pool3d backward, function `max_pool3d_with_indices_backward_out_frame` is defined with args as `..., oheight, owidth, ...` but called with `..., owidth, oheight, ...`. As a result gradients are not fully calculated along the longer dimension due to insufficient grid size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36820
Differential Revision: D21120078
Pulled By: ngimel
fbshipit-source-id: d061726647a4a45d45d5c1a00f2f1cf2745726a8