[DataPipe] Reset Shuffler's iterator when NotStarted (#83535)
This PR changes the behavior of `IterDataPipe` to always invoke `reset` for the state of `NotStarted`. The main reason is we normally put lazy initialization code into `reset` function. Even for the state of `NotStarted`, we should invoke `reset` to initialize those lazy variables. Otherwise, we have to manually determine if the state is `NotStarted` or `Iterating` in `__iter__` function and only manually invoke `reset` in the state of `NotStarted`.
This PR also makes `Shuffler` is able to serialize with `buffer` and `rng_state`.
The following part is removed:
~I am also add `_snapshot_state` into serialization state and during `__setstate__` only change the state to `Restored` if the original state is `Iterating`. Especially, for the case of deserializing/serializing `NotStarted` DataPipe (multiprocessing), we would invoke `set_seed` for `Shuffler`. We need the `DataPipe` remains as `NotStarted` to properly `reset`.~
I am listing all the expected behavior state transition below:
- Initial state: `NotStarted`
- `iter` -> Call `reset` and change the state to `Iterating`
- serialize/deserialize -> Keep the state as `NotStarted` (will `reset` if `iter` is called afterwards)
- Initial state: `Iterating`
- `iter` -> Call `reset` and keep the state to `Iterating`
- serialize/deserialize -> Change the state as `Restored`
- Initial state: `Restored`
- `iter` -> Only change the state to `Iterating`
- serialize/deserialize -> Not allowed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83535
Approved by: https://github.com/NivekT