DeepSpeed
set adamw_mode default true (follows FusedAdam and < 0.3.11 logic)
#844
Merged

Loading