[DDP] Call ensure_prior_reduction_finished within lock
This function accesses member variables that can be modified by
different threads (i.e. autograd engine threads), so call it within lock scope.
Differential Revision: [D27474526](https://our.internmc.facebook.com/intern/diff/D27474526/)
[ghstack-poisoned]