CMake: Include instead of copying cpu kernel files (#67656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67656
Currently, each cpu kernel file is copied into the build folder 3 times to give them different compilation flags. This changes it to instead generate 3 files that `#include` the original file. The biggest difference is that updating a copied file requires `cmake` to re-run, whereas include dependencies are natively handled by `ninja`.
A side benefit is that included files show up directly in the build dependency graph, whereas `cmake` file copies don't.
Test Plan: Imported from OSS
Reviewed By: dagitses
Differential Revision: D32566108
Pulled By: malfet
fbshipit-source-id: ae75368fede37e7ca03be6ade3d4e4a63479440d