[C2] Revive unsafe CoalesceOp (#49402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49402
In cases of NCCLAllReduce operations there could be non-trivial overhead for
launching cooperative kernels (especially in case of async execution of
different parts of the model). This diff is reviving this operator to make it
possible to fuse multiple operations into a single kernel.
Test Plan:
Unit-test.
Used in a later diff.
Reviewed By: xianjiec
Differential Revision: D25531206
fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc