onnxruntime
e9ab56fa - Adding RunOptions synchronization behaviour to C/C++ API (#14088)

Commit

3 years ago

Adding RunOptions synchronization behaviour to C/C++ API (#14088) ### Description This is exposing the already existent interface of asynchronous work of all CUDA base EP's (CUDA + TensorRT). ### Motivation and Context This is something requested in #12216. It will enable users to build an efficient data pipeline with ONNXRuntime and CUDA pre-/post-processing. PCI traffic to the CUDA device can be run during inference as soon as the postprocessing consumed the input buffer and it can be overwritten. To do this work has to be submitted async to the device. Please see below screenshots showing the illustration of this using NSight Systems. Async: <img width="1401" alt="image" src="https://user-images.githubusercontent.com/44298237/209894303-706460ed-cbdb-4be2-a2e4-0c111ec875dd.png"> Synchronous: <img width="1302" alt="image" src="https://user-images.githubusercontent.com/44298237/209894630-1ce40925-bbd5-470d-b888-46553ab75fb9.png"> Note the gap in between the 2 inference runs due to issuing PCI traffic in between and to the CPU overhead the active synchronization has. --------- Co-authored-by: Chi Lo <chi.lo@microsoft.com>

References

#14088 - Adding RunOptions synchronization behaviour to C/C++ API

Author

gedoensmax

Parents

cd7098fd

onnxruntime e9ab56fa - Adding RunOptions synchronization behaviour to C/C++ API (#14088)

onnxruntime
e9ab56fa - Adding RunOptions synchronization behaviour to C/C++ API (#14088)