fix CMake FindCUDA module for cross-compiling (#121590)
Fix two cross-compiling issues in `FindCUDA.cmake` (xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/224).
1. `setup.py` reads the cached `CUDA_TOOLKIT_ROOT_DIR`, so it must be cached.
https://github.com/pytorch/pytorch/blob/41286f1505ffb214d386d72e4b72ebd680a4a475/setup.py#L593
I also submitted it to the upstream CMake: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9323.
2. [SBSA toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=arm64-sbsa&Compilation=Cross&Distribution=Ubuntu&target_version=20.04&target_type=deb_network_cross) is in `sbsa-linux` directory. See also https://gitlab.kitware.com/cmake/cmake/-/issues/24192
I also submitted it to the upstream CMake: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9324
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121590
Approved by: https://github.com/malfet