Improving BinaryOpsKernel.cu (#29428)
Summary:
- Building `BinaryOpsKernel.cu` takes extremely long. Split the original file into 3 pieces, and copy-paste code into these files.
- Remove some useless logic
- change some wrong ops name `*_cpu` -> `*_cuda`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29428
Differential Revision: D18408858
Pulled By: VitalyFedyunin
fbshipit-source-id: 29323b0bc40a928ae698345ad1ffe46c5851b012