Add launch bounds for TopK kernel, be more conservative in sorting (#17296)
Summary:
The particular use case reported is Jetson TX2 and maskrcnn.
Fixes #17144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17296
Differential Revision: D14147886
Pulled By: soumith
fbshipit-source-id: 44d5a89aaeb4cc07d1b53dd90121013be93c419c