[TensorRT EP] Add stream sync after enqueue (#18026)
If the model is partitioned into TRT subgraphs and CUDA EP node, we
observed cuda stream synchronization issue when multithreading. Calling
stream sync API after enqueue can solve this issue without adding much
performance overhead.