Cudnn bn size fix (#32763)
Summary:
Should fix https://github.com/pytorch/pytorch/issues/29744 by falling back to native batch norm implementation, if cudnn cannot execute the provided shape.
Shape numbers were verified for cudnn 7.6.5.32 with tensor shapes:
```python
# for spatial bn
x = torch.Size([880801, 256, 5])
x = torch.Size([65535, 256, 5])
x = torch.Size([880801, 64, 4, 4])
x = torch.Size([65535, 64, 4, 4])
# for per-act bn
x = torch.Size([131070, 2048])
x = torch.Size([262136, 2048])
```
for `training()` and `eval()` mode using `torch.float32` and `torch.float16`.
I've increased the shape of our current smoke test to, but I can also add all use cases of the support matrix, if wanted.
CC ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32763
Differential Revision: D19644328
Pulled By: ngimel
fbshipit-source-id: c2151bf9fe6bac79b8cbc69cff517a4b0b3867aa