hardswish: add cuda kernels (#36350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36350
Adds CUDA kernels for hardswish in order to unblock use in training.
Test Plan:
added test coverage for forward pass
ran this script for various input sizes to test backward pass against a manual Hardswish module: https://gist.github.com/vkuzo/30e196b059427725817f2ee934ed0384
Imported from OSS
Differential Revision: D20955590
fbshipit-source-id: 635706fbf18af9a4205f2309f3314f2996df904d