DeepSpeed
11aa880e - Fix Zero3 contiguous grads, reduce scatter false accuracy issue (#4321)

Commit
2 years ago
Fix Zero3 contiguous grads, reduce scatter false accuracy issue (#4321) it is a corner case met when running Zero3 on flan-t5 HF model. Where HF auto bucket size becomes exactly the same size if hidden_size^2 This is the exact size of some of the params. due to the bug the params are not being reduced during backward. Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading