[Model Averaging] Fix post_localSGD_optimizer
I find that the original implementation of `post_localSGD_optimizer.step()` is incorrect:
Whenever `averager.average_parameters()` is called, the built-in step counter will be increased. Therefore, this should only be called exactly once per `optimizer.step()`. However, if a model has multiple param groups or params, the current implementation will call `averager.average_parameters()` multiple times and over-increase the step counter.
Relevant proposals since hierarchical SGD can be supported on `post_localSGD_optimizer`: https://github.com/pytorch/pytorch/issues/73382, https://github.com/pytorch/pytorch/issues/71325
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74737
Approved by: https://github.com/mrshenli