sequence parallel with communication overlap #5691
fix ds-sp grad scale for zero0
cb15ffa1
enable o compute async
a037a53f
enable qk bwd async all2all
42d12849
fwd optimi
6919af43
fix1 remove linear arg, remove note
39596ac4
async qkv fwd, optimi cpu ,make fwd call fast
eb760c01
update
c7d3374a
refine code
70a6d0c9
refine code
65afd895
Revert "fix ds-sp grad scale for zero0"
4b3518ed
Merge remote-tracking branch 'upstream/master' into sp_overlap_comm
634d6d93
fix format
54b5ce3d
fix format
c9f0c0ad
refine code
0862aa37
add register for v, ensuring they launch on a single thread.
1c596dd6
Merge branch 'master' into sp_overlap_comm
96e76962
remove v
2fbbd5eb
remove v
765a664f
fix notes and format
171eb67e
Merge branch 'master' into sp_overlap_comm
020ab5f8
tohtana
approved these changes
on 2024-07-22
loadams
merged
17ed7c77
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub