Check if input is ChannelsLast or ChannelsLast3d for quantized AdaptivePool3d. (#42780)
Summary:
cc z-a-f, vkuzo. This serves as a very simple first step to the issue mentioned in https://github.com/pytorch/pytorch/issues/42779.
# Description
Since `ChannelsLast` and `ChannelsLast3d` are not equivalent [(MemoryFormat.h)](https://github.com/pytorch/pytorch/blob/4e93844ab168ee0cf1aaa1b4712d6aad0e2972f8/c10/core/MemoryFormat.h#L27), the "fast" path for `NDHWC` is ignored.
This PR would produce the expected behaviour for 4 (5 if including batch) dimensional tensors.
# Benchmarks
## Notes
- For channels `< 8`, it is actually slower than before.
- For `qint32`, it is actually `2x` slower than before.
- For channels `> 8`, the execution time decreases up to `9-10` times in the benchmarks.
- While execution time does improve, it remains slower than the `contiguous` variant when channels `> 64`.
## C++
<img width="1667" alt="before_after_py" src="https://user-images.githubusercontent.com/37529096/89711911-5da22d80-d9e1-11ea-9b30-0c23d46c2c93.png">
## Python
<img width="1523" alt="before_after_cpp" src="https://user-images.githubusercontent.com/37529096/89711906-58dd7980-d9e1-11ea-9696-1963f394198a.png">
## Reproduce
See https://github.com/pytorch/pytorch/issues/42779.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42780
Reviewed By: smessmer
Differential Revision: D23035424
Pulled By: z-a-f
fbshipit-source-id: 15594846f66b73c22d2371eb8e47c472324d6139