DeepSpeed
Fixing the reshape bug in sequence parallel alltoall, which corrupted all QKV data
#5664
Merged

Loading