Fix: delay CUDADriverWrapper instantiation to avoid uncaught exception (#25117)
Fix: delay CUDADriverWrapper instantiation to avoid uncaught exceptions
when CUDA is unavailable
### Description
This PR moves the static instantiation of CUDADriverWrapper from a
class-level static field to a function-local static inside
CUDADriverWrapper::GetInstance(). This change ensures that the CUDA
driver is only loaded when the instance is actually needed, rather than
at static initialization time. It preserves the singleton behavior while
deferring instantiation to runtime.
### Motivation and Context
When libcuda.so.1 is not available on the system, the constructor of
CUDADriverWrapper throws an exception. Previously, this exception was
triggered during static initialization, leading to an uncatchable
std::terminate() and process termination. By moving the instance into
GetInstance() as a function-local static, the exception can now be
caught by client code (e.g., in try/catch), allowing graceful fallback
when CUDA is unavailable.