pytorch
bd383073 - [LT] Integrating with DDP with c10d comm ops falling back to eager (#72631)

Commit
2 years ago
[LT] Integrating with DDP with c10d comm ops falling back to eager (#72631) Summary: This pull request/branch elaborates a prototype that makes all c10d communicative ops fallback to eager to support DDP with LazyTensor. This is a very naive prototype to demonstrate a way how DDP can work with LazyTensor. Pros: All allReduce ops would be executed in the exact same order as the bucketing algorithm in DDP schedule. Cons: It forces to break the graph into small pieces whenever there is an allReduce. So theoretically it would have worse performance than the approach where we can capture the full graph. The full graph approach will be the next prototype. For materializing the tensors, it chooses to use the .to('cuda:i') method which in the current design of LazyTensor will actually copy the tensor even the tensor is already in the correct cuda device. It might not overlap the communication with the computation well given the allReduces are probably triggered during the trace time instead of the execution time. Need more evidences to prove this theory. Test Plan: LTC_TS_CUDA=1 python lazy_tensor_core/test/ddp/ddp.py
Author
Jiewen Tan
Parents
Loading