Improving memory utilization of Z2+MoE #2079
add maximum param group size to moe
47af7289
process optimizer groups individually
8d31edb0
correct timer placement
e6702d88
tested with 1.3B 2.7B and 6.7B
7dd3da28
Merge branch 'master' into zero2_optim_tiling
be0d2fe4
correction in DeepSpeedCPUAdam
b02a34f4
torch 1.8.0 backwards compatibility
57f64c16
Merge branch 'master' into zero2_optim_tiling
563d3eb0
modify optimizer groups dynamically
1d9852cd
correction for DSCpuAdam
af4da67f
tjruwase
approved these changes
on 2022-07-12
Merge branch 'master' into zero2_optim_tiling
2ff9812b
restored comments
41a7d161
Merge branch 'zero2_optim_tiling' of github.com:microsoft/DeepSpeed i…
e9a1cd33
remove print statements
3209c3b3
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub