onnxruntime
Pipeline Parallel Experimental Python API
#5815
Merged

Pipeline Parallel Experimental Python API #5815

wschin merged 87 commits into master from wechi/pppy1
wschin
wschin wschin requested a review from BowenBao BowenBao 5 years ago
wschin wschin requested a review from liqunfu liqunfu 5 years ago
wschin wschin requested a review from spandantiwari spandantiwari 5 years ago
wschin wschin requested a review from thiagocrepaldi thiagocrepaldi 5 years ago
wschin wschin requested a review 5 years ago
tlh20
tlh20 requested changes on 2020-11-20
wschin wschin force pushed from 1bb0f425 to 7805515d 5 years ago
xadupre
xadupre commented on 2020-11-30
tlh20
tlh20 requested changes on 2020-11-30
wschin wschin force pushed from 7805515d 5 years ago
wschin wschin force pushed to d20e13cf 5 years ago
wschin wschin force pushed to d3000169 5 years ago
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin
wschin commented on 2020-12-07
wschin wschin force pushed 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed 5 years ago
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
xzhu1900
xzhu1900 commented on 2020-12-08
wschin wschin force pushed 5 years ago
wschin wschin force pushed to 9f71b313 5 years ago
jupvfranco
jupvfranco commented on 2020-12-09
wschin wschin force pushed 5 years ago
wschin wschin changed the title [WIP] Pipeline Parallel Experimental Python API Pipeline Parallel Experimental Python API 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed to d9ff7416 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed to c36a748e 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed 5 years ago
wschin wschin force pushed to 099c9f1d 5 years ago
thiagocrepaldi
thiagocrepaldi requested changes on 2020-12-14
wschin wschin force pushed from 7516043d 5 years ago
wschin wschin force pushed to 6b7f5628 5 years ago
thiagocrepaldi
thiagocrepaldi commented on 2020-12-16
wschin wschin force pushed to 244e23ab 5 years ago
wschin Compile code
1c19f19f
wschin Add missing reference
75140c02
wschin No 0-length vector
f6855118
wschin Generate partitions
73c7b971
wschin Fixes some bugs
d53884a2
wschin Run pipeline
912920af
xzhu1900 add cut info support to fe
2fa7b1e4
tensor slice function and tests
afb7f971
Run model parallel with sub-batches
1ab4d819
wschin Passing in tensors to slice
78a2d50e
wschin Finally run 2-stage for MLP
5314b5b5
wschin Enable NCCL
161c5eec
wschin Remove print's
9e6d57cc
wschin Refactorize tensor slicing helper
be554b93
wschin Rename tensorhelper.* to tensor_helper.*
8050857f
wschin This is a combination of 3 commits.
dff154fc
wschin Isolate pipeline code in PipelineTrainingSession
64f1ca70
wschin Clean TrainingSession and PipelineTrainingSession
0fe6e334
wschin Merge InsertEventOps into SetEventSynchronization
a17af6b5
wschin Address comments
2252e279
wschin Fix tests and address a comment
1b4f3121
wschin Fix a test
02688d64
wschin Fix windows build
89f190b2
wschin Try fix AMD build
b6944dd6
wschin Try
864ad2ce
wschin Fix typo
84cfb41f
fix build failures of ROCM EP
75336c13
wschin Put value rules and key rules into "schema"
fe6d2138
wschin Switch to runnable solution
c2eb9cbe
wschin Change comment to reflect code change
631ded0c
wschin Also fix sliced_axes following sliced_schema strategy
15dc2a3f
wschin Fix a memory bug caused by accidently-changed macro variable (should …
6f9b23b9
wschin Fix tests
571235ef
wschin Reach agreement with frontend team on trainer's option API
b440f28f
wschin Print info to debug distributed CI
7cc7d35b
wschin Initialize MPIContext in NCCL
583891f9
wschin Revert "Print info to debug distributed CI"
f7e93304
wschin Add missed changes
37f8c1dc
wschin wschin force pushed to 37f8c1dc 5 years ago
wschin Add pipeline parallel Python test
4992e80b
wschin Relax DxHxP check
47335b7c
wschin Fix syntax
9587928f
wschin Fix a dead lock
7b3d49e9
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
f3e48e62
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
4b199864
wschin Address a missed comment
bb756dd9
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
11136118
wschin wschin force pushed 5 years ago
wschin
wschin commented on 2020-12-28
wschin Run PP test for real
29d70652
wschin wschin force pushed to 29d70652 5 years ago
baijumeswani
baijumeswani commented on 2020-12-28
baijumeswani
baijumeswani commented on 2020-12-28
baijumeswani
baijumeswani commented on 2020-12-28
wschin Fix test
7c3a05d0
wschin wschin force pushed to 7c3a05d0 5 years ago
wschin Reorg test folder
550bd6a3
wschin Try fix import path
4504d617
wschin wschin force pushed to 4504d617 5 years ago
wschin Try setting cwd
d8e09f18
wschin wschin force pushed to d8e09f18 5 years ago
wschin Cast int to string
329ec068
wschin Add ort path
2d4fce50
wschin wschin force pushed to 2d4fce50 5 years ago
thiagocrepaldi
thiagocrepaldi commented on 2021-01-04
wschin
wschin commented on 2021-01-05
wschin Reorder compute
e7cdc776
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
d4b899a8
wschin Error out when no enough GPUs
6f645e4e
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
b41e54dc
wschin Add file removed by merging
34eae712
jupvfranco
jupvfranco commented on 2021-01-06
jupvfranco
jupvfranco commented on 2021-01-06
wschin Add utils for concatenating tensors
1f09c586
thiagocrepaldi
thiagocrepaldi commented on 2021-01-06
thiagocrepaldi
thiagocrepaldi commented on 2021-01-06
thiagocrepaldi
wschin Fix builds wo cuda
03677030
wschin wschin force pushed 5 years ago
wschin Address Python comments
e1c1ca76
wschin wschin force pushed to e1c1ca76 5 years ago
wschin Fix CI tests
0c303655
wschin Fix CI tests
9dd08720
wschin Fix typos
797e8105
wschin
wschin Fix CI tests
ce4483d4
wschin Modify trainer's options
b9e18f15
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
e61aa00e
wschin Fix a test
317d4de9
wschin Add and fix mixed-precision test
e3c73b81
wschin Polish
be22f71d
wschin Set loss and loss scaling differently for PP and non-PP
47283e70
wschin wschin force pushed to 47283e70 5 years ago
wschin Polish
ea94416a
wschin
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
1435b392
tlh20
tlh20 dismissed these changes on 2021-01-13
thiagocrepaldi
thiagocrepaldi requested changes on 2021-01-13
wschin Address comments
7635d5fb
wschin wschin dismissed their stale review via 7635d5fb 5 years ago
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
ab13a6ec
wschin Fix
492f76a4
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
0da14581
Update orttraining/orttraining/python/training/orttrainer_options.py
c17f3f21
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
519da604
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
c950f05f
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
5e69177b
Update orttraining/orttraining/python/training/orttrainer_options.py
a9cebe9f
Update orttraining/orttraining/python/training/orttrainer_options.py
40d8a6d6
Update orttraining/orttraining/python/training/orttrainer_options.py
fb5476b1
thiagocrepaldi
thiagocrepaldi commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
8eb29754
thiagocrepaldi
thiagocrepaldi dismissed these changes on 2021-01-14
wschin Merge remote-tracking branch 'public/master' into wechi/pppy1
14fe3fea
wschin Merge branch 'wechi/pppy1' of github.com:microsoft/onnxruntime into w…
2a55e2e7
wschin wschin dismissed their stale review via 2a55e2e7 5 years ago
SherlockNoMad
SherlockNoMad approved these changes on 2021-01-15
liqunfu
liqunfu approved these changes on 2021-01-15
wschin wschin merged 8ce252ca into master 5 years ago
wschin wschin deleted the wechi/pppy1 branch 5 years ago
wschin

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone