[Gradient Compression] Refactor tensor grouping in PowerSGD (#52981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52981
No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.
Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in https://github.com/pytorch/pytorch/pull/52541#discussion_r580867311
ghstack-source-id: 122997032
Test Plan:
waitforbuildbot
Already LGTMed by PowerSGD paper author.
Ads1x (completed):
https://www.internalfb.com/intern/tupperware/details/job/?handle=priv3_global%2Fmast_hpc%2Ftsm_hpc-wayi_ads_10x_POWER_SGD_gpu8_2021-02-28_15-29.trainer&tatwTabs=tasks&task_id=0&task_tab=TASK_LOGS
Detectron2:
1) Before refactoring:
f254353864
Accuracy: 39.972
Overall training speed: 67498 iterations in 6:15:42 (0.3340 s / it)
2) After refactoring:
f254353380
Accuracy: 39.944
Overall training speed: 67498 iterations in 6:09:41 (0.3286 s / it)
Reviewed By: rohan-varma
Differential Revision: D26713689
fbshipit-source-id: 12cfcb65feaa2a2d94e3c7793073031f13828305