DeepSpeed
9f4a8763 - Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 (#2999)

Commit
2 years ago
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 (#2999) * * try to fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 * * fix format error * * fix format issue * * add TODO for integrated testing of TP and ZeRO 1/2/3 * fix default pg error --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading