[Gradient Compression] Update the default value of start_powerSGD_iter and update the docstring (#55272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55272
1. Set 1K as the default value of `start_powerSGD_iter` for practicability. The original default value 10 is usually too small for real use cases. The new default value 1K is also consistent with PyTorch Lightning.
2. Update the docstring of `start_powerSGD_iter` to remind the users to set a value no less than the warm-up steps if any.
3. Update some unit tests to start PowerSGD early.
ghstack-source-id: 125707662
Test Plan: waitforbuildbot
Reviewed By: shuyingsunshine21
Differential Revision: D27553388
fbshipit-source-id: 40076419bc85755c0c0b64b79ba914b241085fcc