[ROCm] Use torch._C._cuda_getArchFlags to get list of gfx archs pytorch was built for (#80498)
*even if no GPUs are available*
When building PyTorch extensions for ROCm Pytorch, if the user doesn't specify a list of archs using PYTORCH_ROCM_ARCH env var, we would like to use the list of gfx archs that PyTorch was built for as the default value. To do this successfully even in an environment where no GPUs are available eg. a build-only CPU node, we need to be able to get the list of archs. `torch.cuda.get_arch_list()` doesn't work here because it calls `torch.cuda.available()` first: https://github.com/pytorch/pytorch/blob/0922cc024eeafa2158c0d00396494a0ae983f8cb/torch/cuda/__init__.py#L463, which will return `False` if no GPUs are available, resulting in an empty list being returned by `torch.cuda.get_arch_list()`. To get around this issue, we call the underlying API `torch._C._cuda_getArchFlags()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80498
Approved by: https://github.com/ezyang, https://github.com/malfet