DeepSpeed
stage_1_and_2.py: do gradient scale only for fp16
#3166
Merged

Loading