Allow for iterations where no module parameter is used (#19821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19821
It is possible that not a single parameter is used during an
iteration. If this is the case, the `prepare_for_backward` function
marks all parameters as unused, kicks off reduction of all buckets,
*and* finalizes the reduction.
This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.
Reviewed By: mrshenli
Differential Revision: D15113272
fbshipit-source-id: dc91458b569cd8c106ddaeea558464b515683550