DeepSpeed
b712babe - Correctness fix PP+ZeRO for gradient accumulation (#1264)

Commit
4 years ago
Correctness fix PP+ZeRO for gradient accumulation (#1264) * pass GAS boundary state from PP -> ZeRO * formatting Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading