pytorch
40eff454 - Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519)

Commit

4 years ago

Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33988 and fix https://github.com/pytorch/pytorch/issues/34083. Previously, the max_pool2d_nhwc kernels used a shared memory with size proportional to the tensor size (c \* h \* w). When the tensor size is too large, the kernel launch fails. This PR follows the guidance in AdaptiveAvgPool2d_nhwc by increasing the number of grid_x with split in "C" dimension. With that change, there will be a maximum limit in the shared memory size (which is less than 48 kb) regardless of tensor size. A benchmark can be found at [here](https://github.com/xwang233/code-snippet/blob/0b98146089ffca65d3d56968a9eafbb401a82493/max-pool2d/max-pool2d.ipynb). TL;DR barely any performance drop is found. cc csarofeen ptrblck jjsjann123 VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/34519 Differential Revision: D20388848 Pulled By: VitalyFedyunin fbshipit-source-id: 9454f385f9315afaab4a05303305578bbcd80b87

Author

xwang233

Committer

facebook-github-bot

Parents

3924c55f

pytorch 40eff454 - Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519)

pytorch
40eff454 - Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519)