onnxruntime
Pipeline Parallel Experimental Python API
#5815
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
87
Changes
View On
GitHub
Pipeline Parallel Experimental Python API
#5815
wschin
merged 87 commits into
master
from
wechi/pppy1
wschin
requested a review
from
BowenBao
5 years ago
wschin
requested a review
from
liqunfu
5 years ago
wschin
requested a review
from
spandantiwari
5 years ago
wschin
requested a review
from
thiagocrepaldi
5 years ago
wschin
requested a review
5 years ago
tlh20
requested changes on 2020-11-20
wschin
force pushed
from
1bb0f425
to
7805515d
5 years ago
xadupre
commented on 2020-11-30
tlh20
requested changes on 2020-11-30
wschin
force pushed
from
7805515d
5 years ago
wschin
force pushed
to
d20e13cf
5 years ago
wschin
force pushed
to
d3000169
5 years ago
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
commented on 2020-12-07
wschin
force pushed
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
5 years ago
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
xzhu1900
commented on 2020-12-08
wschin
force pushed
5 years ago
wschin
force pushed
to
9f71b313
5 years ago
jupvfranco
commented on 2020-12-09
wschin
force pushed
5 years ago
wschin
changed the title
[WIP] Pipeline Parallel Experimental Python API
Pipeline Parallel Experimental Python API
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
to
d9ff7416
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
to
c36a748e
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
5 years ago
wschin
force pushed
to
099c9f1d
5 years ago
thiagocrepaldi
requested changes on 2020-12-14
wschin
force pushed
from
7516043d
5 years ago
wschin
force pushed
to
6b7f5628
5 years ago
thiagocrepaldi
commented on 2020-12-16
wschin
force pushed
to
244e23ab
5 years ago
Compile code
1c19f19f
Add missing reference
75140c02
No 0-length vector
f6855118
Generate partitions
73c7b971
Fixes some bugs
d53884a2
Run pipeline
912920af
add cut info support to fe
2fa7b1e4
tensor slice function and tests
afb7f971
Run model parallel with sub-batches
1ab4d819
Passing in tensors to slice
78a2d50e
Finally run 2-stage for MLP
5314b5b5
Enable NCCL
161c5eec
Remove print's
9e6d57cc
Refactorize tensor slicing helper
be554b93
Rename tensorhelper.* to tensor_helper.*
8050857f
This is a combination of 3 commits.
dff154fc
Isolate pipeline code in PipelineTrainingSession
64f1ca70
Clean TrainingSession and PipelineTrainingSession
0fe6e334
Merge InsertEventOps into SetEventSynchronization
a17af6b5
Address comments
2252e279
Fix tests and address a comment
1b4f3121
Fix a test
02688d64
Fix windows build
89f190b2
Try fix AMD build
b6944dd6
Try
864ad2ce
Fix typo
84cfb41f
fix build failures of ROCM EP
75336c13
Put value rules and key rules into "schema"
fe6d2138
Switch to runnable solution
c2eb9cbe
Change comment to reflect code change
631ded0c
Also fix sliced_axes following sliced_schema strategy
15dc2a3f
Fix a memory bug caused by accidently-changed macro variable (should …
6f9b23b9
Fix tests
571235ef
Reach agreement with frontend team on trainer's option API
b440f28f
Print info to debug distributed CI
7cc7d35b
Initialize MPIContext in NCCL
583891f9
Revert "Print info to debug distributed CI"
f7e93304
Add missed changes
37f8c1dc
wschin
force pushed
to
37f8c1dc
5 years ago
Add pipeline parallel Python test
4992e80b
Relax DxHxP check
47335b7c
Fix syntax
9587928f
Fix a dead lock
7b3d49e9
Merge remote-tracking branch 'public/master' into wechi/pppy1
f3e48e62
Merge remote-tracking branch 'public/master' into wechi/pppy1
4b199864
Address a missed comment
bb756dd9
Merge remote-tracking branch 'public/master' into wechi/pppy1
11136118
wschin
force pushed
5 years ago
wschin
commented on 2020-12-28
Run PP test for real
29d70652
wschin
force pushed
to
29d70652
5 years ago
baijumeswani
commented on 2020-12-28
baijumeswani
commented on 2020-12-28
baijumeswani
commented on 2020-12-28
Fix test
7c3a05d0
wschin
force pushed
to
7c3a05d0
5 years ago
Reorg test folder
550bd6a3
Try fix import path
4504d617
wschin
force pushed
to
4504d617
5 years ago
Try setting cwd
d8e09f18
wschin
force pushed
to
d8e09f18
5 years ago
Cast int to string
329ec068
Add ort path
2d4fce50
wschin
force pushed
to
2d4fce50
5 years ago
thiagocrepaldi
commented on 2021-01-04
wschin
commented on 2021-01-05
Reorder compute
e7cdc776
Merge remote-tracking branch 'public/master' into wechi/pppy1
d4b899a8
Error out when no enough GPUs
6f645e4e
Merge remote-tracking branch 'public/master' into wechi/pppy1
b41e54dc
Add file removed by merging
34eae712
jupvfranco
commented on 2021-01-06
jupvfranco
commented on 2021-01-06
Add utils for concatenating tensors
1f09c586
thiagocrepaldi
commented on 2021-01-06
thiagocrepaldi
commented on 2021-01-06
Fix builds wo cuda
03677030
wschin
force pushed
5 years ago
Address Python comments
e1c1ca76
wschin
force pushed
to
e1c1ca76
5 years ago
Fix CI tests
0c303655
Fix CI tests
9dd08720
Fix typos
797e8105
Fix CI tests
ce4483d4
Modify trainer's options
b9e18f15
Merge remote-tracking branch 'public/master' into wechi/pppy1
e61aa00e
Fix a test
317d4de9
Add and fix mixed-precision test
e3c73b81
Polish
be22f71d
Set loss and loss scaling differently for PP and non-PP
47283e70
wschin
force pushed
to
47283e70
5 years ago
Polish
ea94416a
Merge remote-tracking branch 'public/master' into wechi/pppy1
1435b392
tlh20
dismissed these changes on 2021-01-13
thiagocrepaldi
requested changes on 2021-01-13
Address comments
7635d5fb
wschin
dismissed their stale review via
7635d5fb
5 years ago
Merge remote-tracking branch 'public/master' into wechi/pppy1
ab13a6ec
Fix
492f76a4
thiagocrepaldi
commented on 2021-01-14
thiagocrepaldi
commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
0da14581
Update orttraining/orttraining/python/training/orttrainer_options.py
c17f3f21
thiagocrepaldi
commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
519da604
thiagocrepaldi
commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
c950f05f
thiagocrepaldi
commented on 2021-01-14
thiagocrepaldi
commented on 2021-01-14
thiagocrepaldi
commented on 2021-01-14
thiagocrepaldi
commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
5e69177b
Update orttraining/orttraining/python/training/orttrainer_options.py
a9cebe9f
Update orttraining/orttraining/python/training/orttrainer_options.py
40d8a6d6
Update orttraining/orttraining/python/training/orttrainer_options.py
fb5476b1
thiagocrepaldi
commented on 2021-01-14
Update orttraining/orttraining/python/training/orttrainer_options.py
8eb29754
thiagocrepaldi
dismissed these changes on 2021-01-14
Merge remote-tracking branch 'public/master' into wechi/pppy1
14fe3fea
Merge branch 'wechi/pppy1' of github.com:microsoft/onnxruntime into w…
2a55e2e7
wschin
dismissed their stale review via
2a55e2e7
5 years ago
SherlockNoMad
approved these changes on 2021-01-15
liqunfu
approved these changes on 2021-01-15
wschin
merged
8ce252ca
into master
5 years ago
wschin
deleted the wechi/pppy1 branch
5 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
thiagocrepaldi
tlh20
baijumeswani
xadupre
jupvfranco
xzhu1900
BowenBao
spandantiwari
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub