Unify treatment of warp size / wave size (#25884)
Summary:
Introduce a C10_WARP_SIZE define in Macros.h
For kernels that had ifdef-ing of WARP_SIZE for ROCm vs CUDA, use said macro. This is no functional change - we merely refactor to unify on one WARP_SIZE definition.
I hope we can encourage use of this macro over more WARP_SIZE definitions being sprinkled across the code base (or numerically hard-coded).
Some kernels remain that have their own WARP_SIZE definitions but did not satisfy above condition. They will be fixed in follow-up PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25884
Differential Revision: D17276662
Pulled By: bddppq
fbshipit-source-id: cef8e77a74ae2e5de10df816ea80b25cb2bab713