Fix CUDA/cuDNN DLL preload paths for CUDA 13 consolidated wheel layout (#29202)
## Description
Fix https://github.com/microsoft/onnxruntime/issues/29198.
NVIDIA restructured the CUDA Python wheels starting with CUDA 13: the
per-component CUDA Toolkit packages (cublas, cufft, cuda_runtime,
cuda_nvrtc, curand, ...) were consolidated into a single
`nvidia/cu{major}` package and the `-cuNN` suffix was dropped from those
package names. This PR updates the DLL/shared-library preload logic and
the wheel dependency metadata so `onnxruntime-gpu` (and
`onnxruntime-trt-rtx`) keep working on both the legacy CUDA 12 layout
and the new CUDA 13 consolidated layout.
## Summary of Changes
### Preload logic (`onnxruntime/__init__.py`)
| File | Change |
|------|--------|
| `onnxruntime/__init__.py` | `_get_nvidia_dll_paths` now detects the
CUDA 13+ consolidated layout and resolves CUDA libraries under
`nvidia/cu{major}` — Windows uses an architecture sub-folder
(`bin/<arch>`, e.g. `bin/x86_64`), Linux uses a flat `lib`. The legacy
CUDA 12 per-component paths are preserved. |
| `onnxruntime/__init__.py` | Added `build_cuda_version` and `arch`
parameters (for testability/arch override); cuDNN paths factored out
since cuDNN keeps its own `nvidia/cudnn` package layout in both schemes.
|
| `onnxruntime/__init__.py` | `print_debug_info` drops the `-cuNN`
suffix from CUDA Toolkit package names for CUDA 13+ (cuDNN keeps its
suffixed name). |
### Wheel dependency metadata (`setup.py`)
| File | Change |
|------|--------|
| `setup.py` | `onnxruntime-gpu` `cuda` extras drop the `-cuNN` suffix
for CUDA 13+ (`nvidia-cuda-nvrtc`, `nvidia-cuda-runtime`,
`nvidia-cufft`, `nvidia-curand`); cuDNN dependency keeps the suffixed
name. |
| `setup.py` | `onnxruntime-trt-rtx` CUDA Runtime dependency drops the
`-cuNN` suffix for CUDA 13+. |
### Tests
(`onnxruntime/test/python/onnxruntime_test_python_preload_dlls.py`)
- New unit tests pin the expected relative paths for the CUDA 12
(legacy) and CUDA 13 (consolidated) layouts on both Windows and Linux,
the Windows arch override, the Linux flat-`lib` layout, the unchanged
cuDNN layout, and the `cuda`/`cudnn` toggles.
## Testing
- Run the new tests: `python -m pytest
onnxruntime/test/python/onnxruntime_test_python_preload_dlls.py` (or
`python -m unittest
onnxruntime.test.python.onnxruntime_test_python_preload_dlls`).
- Backward compatibility: CUDA 12 paths and the cuDNN layout are
unchanged; only CUDA 13+ takes the new consolidated paths and unsuffixed
package names.
- Build in Linux and Windows, and `pip install
onnxruntime-gpu*.whl[cuda,cudnn]`, then `import onnxruntime;
onnxruntime.preload_dlls()` can run successfully in python.
## Checklist
- [x] Tests added/updated
- [x] No breaking changes (CUDA 12 behavior preserved)