onnxruntime
Add Distributed Checkpointing support
#3639
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
Add Distributed Checkpointing support
#3639
ashbhandare
merged 15 commits into
master
from
aibhanda/distributed_checkpoint
ashbhandare
requested a review
5 years ago
ashbhandare
added
training
ashbhandare
requested a review
from
thiagocrepaldi
5 years ago
ashbhandare
requested a review
from
jessebenson
5 years ago
ashbhandare
requested a review
from
SherlockNoMad
5 years ago
ashbhandare
force pushed
from
29463ea9
to
257086a6
5 years ago
thiagocrepaldi
requested changes on 2020-04-23
jessebenson
commented on 2020-04-24
ashbhandare
changed the base branch from
ort_training
to
master
5 years ago
jessebenson
dismissed these changes on 2020-04-28
ashbhandare
dismissed their stale review
5 years ago
Change naming of moments to Moment_x_<weight_name>
0e0ef3e3
Modify zero test to hit bert like scenario.
22be6146
Revert "Modify zero test to hit bert like scenario."
0e4334f0
Add checkpointing code and zero checkpoint aggregation
ad10dcba
Correct aggregation for LAMB, cleanup
580788e3
Add simple checkpointing test
89f33ff5
Add test for zero checkpoint aggregation
fc05df80
Fix tests
57ddebfe
fix test
3e366524
Review changes
415295f7
Fix test after review comment fix
7f1540fb
Fix API, test
ee759f3f
ashbhandare
force pushed
to
ee759f3f
5 years ago
Fix test after API change
d804b7ed
ashbhandare
removed review request
from
SherlockNoMad
5 years ago
Decouple save load from ORTTrainer
e1d13a27
thiagocrepaldi
requested changes on 2020-04-28
Add flag to not break checkpointing with ORTModel'
4e885090
thiagocrepaldi
approved these changes on 2020-04-28
thiagocrepaldi
approved these changes on 2020-04-29
ashbhandare
merged
58f53966
into master
5 years ago
ashbhandare
deleted the aibhanda/distributed_checkpoint branch
5 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
thiagocrepaldi
jessebenson
Assignees
No one assigned
Labels
training
Milestone
No milestone
Login to write a write a comment.
Login via GitHub