pytorch
c07babbc - [Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410)

Commit
3 years ago
[Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57410 FP16 gradient compression may run into 'inf' issue. switching to division before allreduce can avoid this problem. ghstack-source-id: 127877083 Test Plan: before chage f268909897 after change: f270950609 If you still sees 'grad_norm = inf' after enabling fp16 hook, you can resume the training and turning off the hook. Reviewed By: SciPioneer Differential Revision: D28128628 fbshipit-source-id: 0b6648637713e4f321e39c9ccb645a6b6f1750a0
Author
Weiyi Zheng
Parents
Loading