[Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51427
A user reported that `start_PowerSGD_iter` failed when it's set as 1. This is because allocating memory for error tensors somehow overlap with bucket rebuilding process at iteration 1.
Check `start_PowerSGD_iter > 1` instead of `start_PowerSGD_iter >= 1`.
Also add a unit test of `test_invalid_powerSGD_state` and some guidance on tuning PowerSGD configs.
Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 120834126
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_invalid_powerSGD_state
Reviewed By: rohan-varma
Differential Revision: D26166897
fbshipit-source-id: 34d5b64bb3dd43acb61d792626c70e6c8bb44a5d