Do not ifdef __launch_bounds__ out for ROCm. (#15228)
Summary:
The compiler understands it and profits from knowing it by not using too
many VGPRs as it defaults to 256 default workgroup size.
Fixes a problem in bringup of ROCm 2.0 on gfx906.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15228
Differential Revision: D13470950
Pulled By: bddppq
fbshipit-source-id: f9aa44c7c95299a099c0ea9317b9044cc056acc5