[Gradient Compression] Report compression rate for batched PowerSGD hook (#55103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55103
Previously compression rate is only reported in PowerSGD hook. Also report this metric for comprehensive experimentation.
It is very easy to compute the sizes before and after compression, because there is only one matrix factorization per bucket, and no accumulation within the bucket is needed.
1) The size before compression is the input tensor size.
2) The size after compression is the size of P + Q, where each has a size of `square_side_length * state.matrix_approximation_rank`.
ghstack-source-id: 125399028
Test Plan: Tested by running scripts/wayi/torch/power_sgd.py locally.
Reviewed By: deadlybulb
Differential Revision: D27474295
fbshipit-source-id: a2225e85be03ab20238f01014d5ec9ae1787c4fb