Code-dedup in PowKernel (#57873)
Summary:
Both CPU and CUDA versions of PowKernel reimplement functionality that
already exists in UnaryOps, such as sqrt, rsqrt and reciprocal
Find this out while looking at sluggish compilation of PowerKernel.cu:
- Before the change it took 11m5s and resulted in 7.6Mb .o file
- After the change compilation finished in 10m20s, and 6.4Mb .o file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57873
Reviewed By: ezyang
Differential Revision: D28304929
Pulled By: malfet
fbshipit-source-id: ac499476280de55a92044b1b041b1246eea74c64