xla
Fix race condition when use multi threads to transfer data in parallel Loader
#5267

Merged

Fix race condition when use multi threads to transfer data in parallel Loader #5267

JackCaoG merged 2 commits into pytorch:master from aws-tianquaw:fix-parallel-loader

aws-tianquaw2 years ago (edited 2 years ago)

This pull request includes a fix to the issue brought by the previous pull request that added feature to increase the number of host to device transfer threads. When using > 1 workers to transfer data, a possible race condition could happen and caused some data to be lost:

Currently, each thread will call queue.close_write() when there is no more data in the loader queue. But if one thread called queue.close_write() while others still need to to put data to the queue, these batches of data will be lost. In this case. the next_item() call in this line will return None while it should return valid data.

Ideally, we should only call queue.close_write() after all threads have completed writing data to the queue to avoid this race condition.

Fix race condition when use > 1 threads to transfer data in parallel …

8f4b5243

aws-tianquaw marked this pull request as draft 2 years ago

aws-tianquaw marked this pull request as ready for review 2 years ago

JackCaoG requested a review from

chandrasekhard2 2 years ago

JackCaoG2 years ago

@Liyang90 Do you have cycle to take a look at this one?

chandrasekhard2 approved these changes on 2023-07-13

JackCaoG2 years ago

@aws-tianquaw DO you mind rebasing this pr? Then ci should start passing

Merge branch 'pytorch:master' into fix-parallel-loader

6cc17a01

JackCaoG merged 1dc5af55 into master 2 years ago

Reviewers

chandrasekhard2

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

xla Fix race condition when use multi threads to transfer data in parallel Loader #5267 Merged

Fix race condition when use multi threads to transfer data in parallel Loader #5267

xla
Fix race condition when use multi threads to transfer data in parallel Loader
#5267

Merged