Megatron-DeepSpeed
a branch combining layer-norm-auto-sync and ds_ckpt_reshape
#292
Open

Loading