pytorch
ef11966a - [composable] Enable replicate + trec_shard overall (#98890)

Commit

1 year ago

[composable] Enable replicate + trec_shard overall (#98890) replicate + trec_shard works if we shard / replicate individually, such as follows: ``` m = TestSparseNN() shard(m.sparse) replicate(m.dense) ``` but does not work if users do the following: ``` m = TestSparseNN() shard(m, sharders=[...]) replicate(m) ``` Many upstream trainers use the latter use case, as sharding is not done on individual module level but rather overall module by specifying planners that contain logic for how to shard different embedding table types. This diff enables the latter approach (while keeping the former intact), but users need to specify `ignored_modules` to ignore embedding tables in replicate(). This is similar to FSDP (class based and composable) and DDP today. Differential Revision: [D44899155](https://our.internmc.facebook.com/intern/diff/D44899155/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98890 Approved by: https://github.com/mrshenli, https://github.com/yhcharles

Author

rohan-varma

Committer

pytorchmergebot

Parents

e45fa1a5

pytorch ef11966a - [composable] Enable replicate + trec_shard overall (#98890)

pytorch
ef11966a - [composable] Enable replicate + trec_shard overall (#98890)