DeepSpeed
9ebeaec7 - zero3 performance optimizations (#3622)

Commit
2 years ago
zero3 performance optimizations (#3622) * Remove dead code params_already_reduced is not used * Prevent evaluation of debug strings Debug strings are evaluated even when logging is disabled * Use contiguous gradients tensor reduce scatter between ranks Use allreduce instead of reduce scatter. lower cpu overhead. * move overflow tracker to optimizer.step Don't check overflow in gradients for every bucket. Do overflow chack once on grad flat buffer just before optimizer step --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Committer
Parents
Loading