fixed launch bounds for gathertopk kernel (#60314)
Summary:
Changed launch bounds for gatherTopK kernel to fix register spilling into local memory.
Comparison (Nvidia Titan-V GPU):
Args: Input size as below, k=32, dim=None

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60314
Reviewed By: mruberry
Differential Revision: D29267789
Pulled By: ngimel
fbshipit-source-id: 4056efb2e44e5527786167af66a127504980a3af