Add PerThreadContext for TRT EP (#16599)

Commit

2 years ago

Add PerThreadContext for TRT EP (#16599) Maintaining one execution context on a per thread basis is suggested per TRT [doc](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#threading) to avoid synchronization issue. For previous TRT EP, we did see synchronization issues when running multithreading on some models, for example, FasterRCNN. This PR leverages per thread context implementation from CUDA EP. Followings are the modifications: - Move CUDA graph and IExecutionContext objects to per thread context. - Remove lock_gruad that previously placed for the whole compute_func() and put lock_gruad in the blocks where multiple threads may update kernel function state, access one builder, create/serialize/save engine, save profile and serialize/save timing cache. - On CentOS, don't unload TRT EP shared library and leave it around, so that destructor of thread local data is still accessible upon thread exits. Note: Tested this PR with onnxruntime_perf_test and the overhead of PerThreadContext is small.

References

#16599 - Add PerThreadContext for TRT EP

Author

chilo-ms

Parents

56bced05

onnxruntime 73037978 - Add PerThreadContext for TRT EP (#16599)

onnxruntime
73037978 - Add PerThreadContext for TRT EP (#16599)