DeepSpeed
ffe0af23 - Fix the bug of deepspeed sequence parallel working with batch size larger than 1 (#5823)

Commit
1 year ago
Fix the bug of deepspeed sequence parallel working with batch size larger than 1 (#5823) Modified the `alltoall` function Verified the results with only `TP`: ![image](https://github.com/user-attachments/assets/9bdd8942-3565-418f-b7be-614293b2f2f6) --------- Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw02.ten.osc.edu> Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com> Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw01.ten.osc.edu> Co-authored-by: Logan Adams <loadams@microsoft.com>
Author
Parents
Loading