DeepSpeed
c1af73f7 - Improving memory utilization of Z2+MoE (#2079)

Commit
3 years ago
Improving memory utilization of Z2+MoE (#2079) * Shards expert parameter groups * Do upscaling, optimizer and deletion of fp32 grads one-by-one on each parameter group in zero-2 Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading