[CUDA] Fix cuda 13 build (#26153)
Fix cuda 13 build errors and warnings.
Related: https://github.com/microsoft/onnxruntime/issues/25936
I've verified the build in Linux and Windows using the following test
settings:
### Build command line
You may need change cuda_home and cudnn_home to your installation
directories, also update CMAKE_CUDA_ARCHITECTURES according to your GPU.
#### Linux Build
```
pip install cmake ninja packaging numpy
sh build.sh --config Release --build_dir build/cuda13 --parallel --use_cuda \
--cuda_version 12.8 --cuda_home /nvida/cuda13.0/ \
--cudnn_home /nvida/cudnn9.12_cu13/ \
--build_wheel --skip_tests \
--cmake_generator Ninja \
--enable_cuda_nhwc_ops \
--use_binskim_compliant_compile_flags \
--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90-real;90-virtual \
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=ON \
--cmake_extra_defines onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON
```
#### Windows Build
```
IF "%VCToolsVersion%"=="" call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
build.bat --cmake_generator "Visual Studio 17 2022" --config Release --build_dir build\cuda13 --build_wheel ^
--parallel 4 --nvcc_threads 1 --build_shared_lib ^
--use_cuda --cuda_version "13.0" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" ^
--cudnn_home "D:\cudnn\9.13.0.50_cuda13" ^
--skip_tests ^
--use_binskim_compliant_compile_flags ^
--enable_cuda_nhwc_ops ^
--cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=native" ^
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=ON ^
--cmake_extra_defines FETCHCONTENT_TRY_FIND_PACKAGE_MODE=NEVER
```
The onnxruntime_test_all.exe is passed for RTX 5060 Ti GPU, so the
binary can support blackwell GPU (CUDA_ARCHITECTURES=120) properly with
CUDA 13.0:
```
[----------] Global test environment tear-down
[==========] 1242 tests from 111 test suites ran. (83468 ms total)
[ PASSED ] 1242 tests.
```