Fixing SyncBN dgrad (#36382)
Summary:
Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382
Differential Revision: D20984446
Pulled By: ngimel
fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd