add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.
Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909
Differential Revision: D14801490
Pulled By: ezyang
fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7