[ROCm] Add conditions for channels last logic (#107812)
Although there are some performance benefits by enforcing NHWC convolutions as inductor's fallback method for all hardware this may not be the case. Currently on ROCm we are seeing some slow downs in gcnArch that do not have optimal NHWC implementations and would like to introduce some control on this behavior in pytorch. On ROCm MI200 series we will default to the enforced last channels behavior aligned with the rest of pytorch but on non-MI200 series we will disable the forced layout.
For now we are using torch.cuda.get_device_name(0) for this control but we will replace with gcnArchName when https://github.com/pytorch/pytorch/pull/107477 lands.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107812
Approved by: https://github.com/jataylo, https://github.com/eellison