DeepSpeed
skip torch.zeros and tensor.copy_ when model parallel is not used
#2479
Merged

Loading