[ROCm] Adjust elementwise_kernel settings on ROCm (#32609)
Summary:
Recent PR https://github.com/pytorch/pytorch/issues/31974 and upcoming PR https://github.com/pytorch/pytorch/issues/32383 are changing the behavior of the elementwise_kernel infrastructure on CUDA.
In order to stay in sync, change the nd-loop behavior to match ROCm and CUDA for now. Once the full rework is done, the ROCm settings will likely diverge again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32609
Differential Revision: D19580121
Pulled By: ezyang
fbshipit-source-id: 4c8dcf6db3ac973e48ece6a665615cfe7d7cb764