AdaptiveAvgPooling nhwc cuda update (#29700)
Summary:
1. Add clip on grid launch configs (Tests added in test_nn.py)
2. Assert on shared memory requirement, gives better hint when error out;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29700
Differential Revision: D18482556
Pulled By: VitalyFedyunin
fbshipit-source-id: df3f653185d7b477b2241f2ef4779670e9a78899