pytorch
aec83ff4 - [DataLoader] Add Numpy seeding to worker of DataLoader (#56488)

Commit
3 years ago
[DataLoader] Add Numpy seeding to worker of DataLoader (#56488) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56488 Considering amount of requests for this feature, introduce numpy seeding as default within each worker for DataLoader. ## BC-breaking Note: - By introducing default numpy.random seeding strategy to workers of DataLoader, users don't need to manually set seed for workers by the `worker_init_fn`. And this PR won't influence users who are currently using `worker_init_fn` to set customized seed for workers. - DataLoader will preserve reproducibility for users who are using numpy.random within Dataset. - Multiprocessing (without `worker_init_fn` to define seed for numpy) - Start method as `spawn`: Each worker will now have seed for numpy random, rather than the seed generated from the imported time of Numpy module that make the DataLoader lose the reproducibility. - Start method as `fork`: Each worker not only have the same benefit as `spawn`, but also have different seed for numpy as default, rather than inheriting the same seed. Using the following Dataset and script as an example: ```py class RandomDataset(Dataset): def __getitem__(self, ind): item = [ind, np.random.randint(1, 10000)] return item def __len__(self): return 20 if __name__ == '__main__'" ctx = mp.get_context('fork') ds = RandomDataset() g = torch.Generator() g.manual_seed(0) dl = DataLoader(ds, 2, shuffle=False, num_workers=4, multiprocessing_context=ctx, generator=g) epochs = 2 for _ in range(epochs): for batch in d;: print(batch) print("====" * 10) ``` ### 1.8.1: Each worker generates same random result per iteration. And the seed will be reset to same for each epoch. ```py tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= ``` ### This PR: Each worker has different seed at the beginning and re-seed for each epoch. ```py tensor([[ 0, 8715], [ 1, 5555]]) tensor([[ 2, 6379], [ 3, 1432]]) tensor([[ 4, 3271], [ 5, 5132]]) tensor([[ 6, 4287], [ 7, 1104]]) tensor([[ 8, 8682], [ 9, 1699]]) ========================= tensor([[ 0, 1374], [ 1, 996]]) tensor([[ 2, 143], [ 3, 3507]]) tensor([[ 4, 5887], [ 5, 4730]]) tensor([[ 6, 7274], [ 7, 738]]) tensor([[ 8, 6374], [ 9, 1572]]) ========================= ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908486 Pulled By: ejguan fbshipit-source-id: 5f313a30563bedeb88be214fa4beca0cefe9e4f4
Author
Parents
Loading