onnxruntime
e688ef1f - Add CUDA plugin EP Sync support for IOBinding (#27919)

Commit
5 days ago
Add CUDA plugin EP Sync support for IOBinding (#27919) ## Description This change wires the CUDA plugin EP into ORT's sync surface (see https://github.com/microsoft/onnxruntime/pull/27538) so `IOBinding` can safely coordinate device work when inputs and outputs are bound on CUDA. It also clarifies the split between EP-level and factory-level sync-stream creation in the design doc and adds Python coverage to validate the new path with simple CUDA-bound models. ## Summary of Changes ### CUDA plugin EP implementation | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/plugin/cuda_ep.cc` | Registers `OrtEp::CreateSyncStreamForDevice` and `OrtEp::Sync` in `CudaEp`, adds per-session CUDA sync-stream creation, and implements a conservative device-wide sync via `cudaSetDevice` + `cudaDeviceSynchronize()`. | | `onnxruntime/core/providers/cuda/plugin/cuda_ep.h` | Declares the new `CreateSyncStreamForDeviceImpl` and `SyncImpl` entry points on `CudaEp`. | ### Tests | File | Change | |------|--------| | `onnxruntime/test/python/transformers/test_cuda_plugin_ep.py` | Adds a helper to resolve the CUDA ordinal from plugin device metadata and adds `IOBinding`-based Add and MatMul tests that bind CUDA inputs/outputs and exercise the plugin EP sync path. | ### Documentation | File | Change | |------|--------| | `docs/cuda_plugin_ep/cuda_plugin_ep_design.md` | Documents that `CudaEp` owns the preferred `OrtEp::CreateSyncStreamForDevice` and `OrtEp::Sync` implementations, while `CudaEpFactory::CreateSyncStreamForDevice` remains a fallback path; also records the new `IOBinding` test coverage. | ## Testing - Set `ORT_CUDA_PLUGIN_PATH` to the rebuilt CUDA plugin library under `build/cuda/Release` and run `python -m pytest onnxruntime/test/python/transformers/test_cuda_plugin_ep.py`. - Verify the new `IOBinding` Add and MatMul tests pass with CUDA-bound `OrtValue` inputs and outputs. - Confirm existing CUDA plugin EP behavior is unchanged for non-`IOBinding` execution paths. ## Motivation and Context `IOBinding` relies on provider synchronization to ensure asynchronous device copies are complete before dependent kernel execution continues. The CUDA plugin EP already supported sync-stream creation at the factory layer, but the staged changes connect the per-session `OrtEp` callbacks that ORT prefers when coordinating bound CUDA execution. The documentation updates make that ownership model explicit so future plugin work does not conflate the fallback factory hook with the primary EP hook. ## Checklist - [x] Tests added/updated - [x] Documentation updated (if applicable) - [x] No breaking changes - [ ] CI passes
Author
Parents
Loading