pytorch
c5fdec22 - [LT] Allow lazy_model.mark_step to specify a device (#72683)

Commit
2 years ago
[LT] Allow lazy_model.mark_step to specify a device (#72683) Summary: Currently this API only synchronizes tensors in the default device. This is bad in distributed environment given there will be multiple devices. This PR adds a paramter to the API such that caller can specify which device to synchronize the tensors. Please refer to the attached cuda1.py for detailed examples. This aligns with how DDP works as rank (device index) is often passed from torch.multiprocessing and then models will need to move to that rank first before training/inference. Another alternative is to add an API for users to setup the default device, which seems too verbose. Test Plan: Run the attached cuda1.py and observe logs suggesting that tensors are synchronized on device Unknown1 (We should fix the log to show CUDA1).
Author
Jiewen Tan
Parents
Loading