[DDP] Call ensure_prior_reduction_finished within lock (#55074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55074
This function accesses member variables that can be modified by
different threads (i.e. autograd engine threads), so call it within lock scope.
ghstack-source-id: 125707513
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D27474526
fbshipit-source-id: 8d43faedd6e6eeeb69e21ce3262337ab83d7ba07