pytorch
d6dec1a5 - Refactor sharding data pipe into a seperate file (#94095)

Commit

1 year ago

Refactor sharding data pipe into a seperate file (#94095) Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. So we don't need to based on whether `is_shardable` and `apply_sharding` are presented in DataPipe in `graph_settings.py`. But open to other discussions. Open question: Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). Differential Revision: D43014692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94095 Approved by: https://github.com/ejguan, https://github.com/NivekT

Author

wenleix

Committer

pytorchmergebot

Parents

59c1b502

pytorch d6dec1a5 - Refactor sharding data pipe into a seperate file (#94095)

pytorch
d6dec1a5 - Refactor sharding data pipe into a seperate file (#94095)