[Gradient Compression] Make orthogonalization_epsilon configurable in PowerSGDState (#55738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55738
Per title, and use 0 as the default value.
It turns out that setting this epsilon as 0 can accelerate convergence and improve accuracy for some use cases.
Test Plan:
unit tests
f264687105
f264675194
Reviewed By: shuyingsunshine21
Differential Revision: D27694971
fbshipit-source-id: b61528c6c817127974acdc4635bccf607532287f