[DataLoader] Add Numpy seeding to worker of DataLoader (#56488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56488
Considering amount of requests for this feature, introduce numpy seeding as default within each worker for DataLoader.
## BC-breaking Note:
- By introducing default numpy.random seeding strategy to workers of DataLoader, users don't need to manually set seed for workers by the `worker_init_fn`. And this PR won't influence users who are currently using `worker_init_fn` to set customized seed for workers.
- DataLoader will preserve reproducibility for users who are using numpy.random within Dataset.
- Multiprocessing (without `worker_init_fn` to define seed for numpy)
- Start method as `spawn`: Each worker will now have seed for numpy random, rather than the seed generated from the imported time of Numpy module that make the DataLoader lose the reproducibility.
- Start method as `fork`: Each worker not only have the same benefit as `spawn`, but also have different seed for numpy as default, rather than inheriting the same seed.
Using the following Dataset and script as an example:
```py
class RandomDataset(Dataset):
def __getitem__(self, ind):
item = [ind, np.random.randint(1, 10000)]
return item
def __len__(self):
return 20
if __name__ == '__main__'"
ctx = mp.get_context('fork')
ds = RandomDataset()
g = torch.Generator()
g.manual_seed(0)
dl = DataLoader(ds, 2, shuffle=False, num_workers=4, multiprocessing_context=ctx, generator=g)
epochs = 2
for _ in range(epochs):
for batch in d;:
print(batch)
print("====" * 10)
```
### 1.8.1:
Each worker generates same random result per iteration. And the seed will be reset to same for each epoch.
```py
tensor([[ 0, 7449],
[ 1, 1519]])
tensor([[ 2, 7449],
[ 3, 1519]])
tensor([[ 4, 9645],
[ 5, 2387]])
tensor([[ 6, 9645],
[ 7, 2387]])
tensor([[ 8, 3118],
[ 9, 4552]])
=========================
tensor([[ 0, 7449],
[ 1, 1519]])
tensor([[ 2, 7449],
[ 3, 1519]])
tensor([[ 4, 9645],
[ 5, 2387]])
tensor([[ 6, 9645],
[ 7, 2387]])
tensor([[ 8, 3118],
[ 9, 4552]])
=========================
```
### This PR:
Each worker has different seed at the beginning and re-seed for each epoch.
```py
tensor([[ 0, 8715],
[ 1, 5555]])
tensor([[ 2, 6379],
[ 3, 1432]])
tensor([[ 4, 3271],
[ 5, 5132]])
tensor([[ 6, 4287],
[ 7, 1104]])
tensor([[ 8, 8682],
[ 9, 1699]])
=========================
tensor([[ 0, 1374],
[ 1, 996]])
tensor([[ 2, 143],
[ 3, 3507]])
tensor([[ 4, 5887],
[ 5, 4730]])
tensor([[ 6, 7274],
[ 7, 738]])
tensor([[ 8, 6374],
[ 9, 1572]])
=========================
```
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27908486
Pulled By: ejguan
fbshipit-source-id: 5f313a30563bedeb88be214fa4beca0cefe9e4f4