Split up BinaryAritmeticKernel.cu to speed up compilation time. (#38263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38263
On my machine, compilation went from 4m8sec to the maximum of the files being compiled in 2m22sec.
Test Plan: Imported from OSS
Differential Revision: D21508985
Pulled By: gchanan
fbshipit-source-id: 2917cd5f30c6b31229053cada93c95e3a27ab29a