pytorch
04d75d20 - Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765) (#78927)

Commit
3 years ago
Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765) (#78927) Fixes https://github.com/pytorch/data/issues/426 This PR introduces two main changes: - It ensures the `ShufflerDataPipe` would share the same seed across distributed processes. - Users can reset `shuffle` for persistent workers per epoch. Detail: - `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline - `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed. - The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker` - I choose not to touch `base_seed` simply for BC issue I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765 Approved by: https://github.com/VitalyFedyunin
Author
Parents
Loading