CUDA Kernels: Use per-operator headers (2/4) (#71213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71213
Splitting this into multiple PRs to keep the diffs more managable.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33949903
Pulled By: malfet
fbshipit-source-id: 0859d131d1cfcfb61d3387f6f5a50a1497974b3e
(cherry picked from commit 52b6bdf2f951a810b19b9f098fc35ae03ec0b29e)