DeepSpeed
Fix the sequence-parallelism for the dense model architecture
#4530
Merged

Loading