onnxruntime
b4869926 - [CUDA EP] remove per-thread allocator (#5415)

Commit
5 years ago
[CUDA EP] remove per-thread allocator (#5415) Now that we are using legacy default stream, which is shared among all inference threads, there is no need to have per-thread allocator. In the past, the race could happen when two threads running concurrently on GPU: thread1: allocA->copyA->computeA->freeA thread2: allocB->copyB->computeB->freeB Note that freeA/B only means the buffer is ready to be allocated on CPU, while the corresponding operation on GPU is not finished yet. It is possible for thread1/2 use the same buffer, when the alloc/free pair are not interleaved (note that alloc/free is thread-safe) If the GPU commands run in separate per-thread default stream, there's a chance that copyA/computeA are interleaved with copyB/computeB, even when the order in CPU execution is not interleaved. This would cause incorrect results if computeB uses copyA's results. By using one legacy default stream, CPU execution order would match the GPU execution order, so if A and B use the same buffer from alloc, the correpsonding copy/compute won't be interleaved. If the copy/compute is indeed interleaved, then allocA and allocB would return different buffers, thus no racing either.
Author
KeDengMS
Parents
Loading