Samyamr/grad acc stage2 (#338)
* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation
* Gradient Accumulation support for Stage 2. Model tests added to test the feature
* formatting
* Update deepspeed_light.py
removing comment
* Update ds_config_func_bs8_zero1.json
reverting this file back. Its not needed for this PR
* defining baseline prefix
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>