pytorch
ec5b83f7 - Make allreduce compatible with fx ProxyTensor (#84126)

Commit View On GitHub

Commit

2 years ago

Make allreduce compatible with fx ProxyTensor (#84126) land after #83122 This PR explores solutions for 2 issues: 1. Collective comm ops are inplace ops, and does not return a tensor. With that, `make_fx` cannot include comm ops in the traced graph. The current solution is to make comm ops return a tuple of `(output_tensors, work_handle)`, so that [`proxy_call`](https://github.com/pytorch/pytorch/blob/90821aab100a436424113e2306eac63f5e247ee5/torch/fx/experimental/proxy_tensor.py#L170-L172) can handle that. It won't change the behavior of existing c10d Python/C++ APIs, so I directly added the code to `Ops.cpp`. 2. `make_fx` does not recognize `ProcessGroup::Work` and will ignore the `wait()` call on the work when tracing graph. However, this might break correctness, as when running the traced function, it could consume a tensor before it's ready. The current solution is to create a `CommTensor` tensor subclass to explicitly call `wait()`. In this PR, I am only doing this in the test, as we will need more discussion to see if we can add this to c10d Python implementations. kudos to @Chillee @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/84126 Approved by: https://github.com/wanchaol

Author

mrshenli

Committer

pytorchmergebot

Parents

f93446ad

pytorch ec5b83f7 - Make allreduce compatible with fx ProxyTensor (#84126)

Commit

pytorch
ec5b83f7 - Make allreduce compatible with fx ProxyTensor (#84126)