DeepSpeed
c1af73f7
- Improving memory utilization of Z2+MoE (#2079)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
3 years ago
Improving memory utilization of Z2+MoE (#2079) * Shards expert parameter groups * Do upscaling, optimizer and deletion of fp32 grads one-by-one on each parameter group in zero-2 Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
References
#2079 - Improving memory utilization of Z2+MoE
Author
siddharth9820
Parents
b0523787
Loading