[ROCM] Navi21 Enablement 4: Normalization kernels (#73543)
Summary:
This PR is a follow up to the following prs.
https://github.com/pytorch/pytorch/pull/69942
https://github.com/pytorch/pytorch/pull/72682
https://github.com/pytorch/pytorch/pull/72809
We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73543
Reviewed By: mruberry
Differential Revision: D34558061
Pulled By: ngimel
fbshipit-source-id: 6f4c68365cefb32d7e9aa01759cbd65fb01e29b9
(cherry picked from commit e42a5763aa1937b4835a069f38b58981ba4f7222)