(#28927)
Summary:
This is to fix https://github.com/pytorch/pytorch/issues/22526
Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28927
Differential Revision: D18241759
Pulled By: soumith
fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999