fixed launch bounds for gathertopk kernel (#60314)
Summary:
Changed launch bounds for gatherTopK kernel to fix register spilling into local memory.
Comparison (Nvidia Titan-V GPU):
Args: Input size as below, k=32, dim=None
![TopKTimingData](https://user-images.githubusercontent.com/22803332/122624922-46978780-d057-11eb-9b52-d5786da432c0.PNG)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60314
Reviewed By: mruberry
Differential Revision: D29267789
Pulled By: ngimel
fbshipit-source-id: 4056efb2e44e5527786167af66a127504980a3af