Fix race condition when use multi threads to transfer data in parallel Loader #5267
Fix race condition when use > 1 threads to transfer data in parallel …
8f4b5243
aws-tianquaw
marked this pull request as ready for review 2 years ago
Merge branch 'pytorch:master' into fix-parallel-loader
6cc17a01
JackCaoG
merged
1dc5af55
into master 2 years ago
Assignees
No one assigned
This pull request includes a fix to the issue brought by the previous pull request that added feature to increase the number of host to device transfer threads. When using > 1 workers to transfer data, a possible race condition could happen and caused some data to be lost:
Currently, each thread will call queue.close_write() when there is no more data in the loader queue. But if one thread called
queue.close_write()
while others still need to to put data to the queue, these batches of data will be lost. In this case. thenext_item()
call in this line will returnNone
while it should return valid data.Ideally, we should only call
queue.close_write()
after all threads have completed writing data to the queue to avoid this race condition.