DeepSpeed
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2
#2999
Merged

Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 #2999

tjruwase merged 39 commits into deepspeedai:master from YizhouZ:yizhou/fix
YizhouZ
YizhouZ * try to fix broadcast error on multi-node training with ZeroStage3 a…
1f5a38ae
YizhouZ YizhouZ requested a review from jeffra jeffra 2 years ago
YizhouZ YizhouZ requested a review from tjruwase tjruwase 2 years ago
YizhouZ YizhouZ requested a review from samyam samyam 2 years ago
YizhouZ YizhouZ requested a review from mrwyattii mrwyattii 2 years ago
YizhouZ Merge branch 'master' into yizhou/fix
e2d27366
tjruwase Merge branch 'master' into yizhou/fix
aad8c5ef
YizhouZ Merge branch 'master' into yizhou/fix
53d5414c
YizhouZ YizhouZ changed the title Try to fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 2 years ago
YizhouZ * fix format error
6db8e2e3
YizhouZ
tjruwase
tjruwase approved these changes on 2023-03-20
tjruwase
YizhouZ
YizhouZ Merge branch 'master' into yizhou/fix
61f09d70
YizhouZ Merge branch 'master' into yizhou/fix
28778c03
YizhouZ Merge branch 'master' into yizhou/fix
9bf11e23
abhilash1910
abhilash1910 commented on 2023-03-27
YizhouZ Merge branch 'master' into yizhou/fix
7f4e3fac
YizhouZ Merge branch 'master' into yizhou/fix
24b98634
YizhouZ Merge branch 'master' into yizhou/fix
eb7d8623
YizhouZ Merge branch 'master' into yizhou/fix
d397366e
YizhouZ Merge branch 'master' into yizhou/fix
3987cd29
YizhouZ Merge branch 'master' into yizhou/fix
330363d4
YizhouZ Merge branch 'master' into yizhou/fix
6f230231
YizhouZ Merge branch 'master' into yizhou/fix
50bd160f
tjruwase
YizhouZ
YizhouZ Merge branch 'master' into yizhou/fix
2ec06001
YizhouZ * fix format issue
c43d7cd3
YizhouZ Merge branch 'master' into yizhou/fix
972f4723
tjruwase
YizhouZ Merge branch 'master' into yizhou/fix
38470b83
YizhouZ Merge branch 'master' into yizhou/fix
85e713f3
YizhouZ * add TODO for integrated testing of TP and ZeRO 1/2/3
bf10543f
YizhouZ
tjruwase Merge branch 'master' into yizhou/fix
6144678e
tjruwase tjruwase enabled auto-merge (squash) 2 years ago
YizhouZ Merge branch 'master' into yizhou/fix
7c509369
YizhouZ Merge branch 'master' into yizhou/fix
ad8fc9ba
YizhouZ Merge branch 'master' into yizhou/fix
03972ca7
YizhouZ Merge branch 'master' into yizhou/fix
9e15ab0f
YizhouZ
tjruwase Merge branch 'master' into yizhou/fix
2b5499c9
tjruwase Merge branch 'master' into yizhou/fix
e0803acd
abhilash1910
tjruwase Merge branch 'master' into yizhou/fix
705cce57
tjruwase
tjruwase Merge branch 'master' into yizhou/fix
d55f6bbe
abhilash1910
tjruwase Merge branch 'master' into yizhou/fix
62601e8c
tjruwase Merge branch 'master' into yizhou/fix
144c85ed
YizhouZ fix default pg error
ac9aff0d
disabled auto-merge 2 years ago
Head branch was pushed to by a user without write access
YizhouZ
tjruwase Merge branch 'master' into yizhou/fix
d9ff81af
tjruwase Merge branch 'master' into yizhou/fix
74fc1cca
YizhouZ
YizhouZ Merge branch 'master' into yizhou/fix
9d54e1a8
tjruwase
tjruwase Merge branch 'master' into yizhou/fix
35d4e693
tjruwase tjruwase added merge-queue
tjruwase Merge branch 'master' into yizhou/fix
4f9dc6e9
tjruwase tjruwase merged 9f4a8763 into master 2 years ago
zte-tcb
YizhouZ YizhouZ deleted the yizhou/fix branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone