DeepSpeed
d229ff17 - Zero3 Fix allreduce optimization for extra large tensor (#3832)

Commit
2 years ago
Zero3 Fix allreduce optimization for extra large tensor (#3832) Grad tensors that don't fit in the bucket flat buffer are not added to it, but still added to params_in_ipg_bucket if such tensors exists use reduce_scatter of params_in_ipg_bucket instead of allreduce. since allreduce assumes all grads are in ipg_bucket_flat_buffer. Add test for reduce scatter=false Fix padding to zeros instead of undefined values Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading