[Model Averaging] Skip model averaging for the first K steps (#61207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61207
Model averager now must be combined with post-localSGD DDP communication hook. It will skip model averaging for the first K steps, because post-localSGD communication hook will run global gradient averaging during this phase.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 133371335
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager
Reviewed By: pritamdamania87
Differential Revision: D29523738
fbshipit-source-id: 3fa9611046e1c0afa4bda78aa3ba200fa2a5fa4b