DeepSpeed
7240abf3 - Samyamr/grad acc stage2 (#338)

Commit
5 years ago
Samyamr/grad acc stage2 (#338) * Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Author
Parents
Loading