pytorch
daff3a81 - [Gradient Compression] PowerSGD comm hook (#48060)

Commit

4 years ago

[Gradient Compression] PowerSGD comm hook (#48060) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48060 Implement a PowerSGD variant that applies to a batched flattened tensor with zero paddings. This version does not require handling 1D tensors and multi-dimenionsal tensors in the input separately, and hence it does not need to create two asyncrhonous future chains. Potential optimizations: 1) Consider FP16 compression throughout PowerSGD. 2) Warm start and save one matrix multiplication per ieration. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117105938 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl Reviewed By: jiayisuse Differential Revision: D24843692 fbshipit-source-id: f44200b1fd6e12e829fc543d21ab7ae086769561

Author

Yi Wang

Committer

facebook-github-bot

Parents

0d8ddb5e

pytorch daff3a81 - [Gradient Compression] PowerSGD comm hook (#48060)

pytorch
daff3a81 - [Gradient Compression] PowerSGD comm hook (#48060)