onnxruntime
Improve checkpointing for Zero stage 1
#5478
Merged

Improve checkpointing for Zero stage 1 #5478

ashbhandare merged 18 commits into master from aibhanda/zero_1_ckpt
ashbhandare
ashbhandare ashbhandare requested a review from BowenBao BowenBao 5 years ago
ashbhandare ashbhandare requested a review from liqunfu liqunfu 5 years ago
ashbhandare ashbhandare requested a review from spandantiwari spandantiwari 5 years ago
ashbhandare ashbhandare requested a review from thiagocrepaldi thiagocrepaldi 5 years ago
ashbhandare ashbhandare requested a review 5 years ago
ashbhandare ashbhandare requested a review from jessebenson jessebenson 5 years ago
thiagocrepaldi
thiagocrepaldi dismissed these changes on 2020-10-13
thiagocrepaldi
thiagocrepaldi commented on 2020-10-13
thiagocrepaldi
thiagocrepaldi commented on 2020-10-13
thiagocrepaldi
thiagocrepaldi thiagocrepaldi requested a review from baijumeswani baijumeswani 5 years ago
thiagocrepaldi
jessebenson
jessebenson commented on 2020-10-15
jessebenson
jessebenson commented on 2020-10-15
jessebenson
jessebenson commented on 2020-10-15
jessebenson
jessebenson dismissed these changes on 2020-10-15
ashbhandare
ashbhandare ashbhandare dismissed their stale review via df4070cb 5 years ago
ashbhandare ashbhandare force pushed from c27b5295 to df4070cb 5 years ago
ashbhandare ashbhandare force pushed from df4070cb 5 years ago
ashbhandare
ashbhandare ashbhandare force pushed to a1ca6e9b 5 years ago
baijumeswani
baijumeswani commented on 2020-10-29
baijumeswani
baijumeswani dismissed these changes on 2020-10-30
ashbhandare ashbhandare dismissed their stale review 5 years ago
change has been addressed, dismissing to unblock
ashbhandare ashbhandare dismissed their stale review via 5a453d00 5 years ago
thiagocrepaldi
thiagocrepaldi
thiagocrepaldi requested changes on 2020-11-02
ashbhandare ashbhandare force pushed 5 years ago
ashbhandare
ashbhandare ashbhandare force pushed to ca498c7b 5 years ago
baijumeswani
baijumeswani commented on 2020-11-03
ashbhandare ashbhandare force pushed 5 years ago
ashbhandare
ashbhandare ashbhandare force pushed to 20b7fead 5 years ago
ashbhandare ashbhandare force pushed from 20b7fead to 705d25e3 5 years ago
ashbhandare
ashbhandare ashbhandare force pushed from 31f1ca8b to dd2db4c3 5 years ago
thiagocrepaldi
ashbhandare
thiagocrepaldi
baijumeswani
baijumeswani commented on 2020-11-25
thiagocrepaldi
thiagocrepaldi commented on 2020-11-25
ashbhandare ashbhandare force pushed from 0d286b1e to 372e3724 5 years ago
ashbhandare ashbhandare force pushed from 372e3724 5 years ago
ashbhandare ashbhandare force pushed to 59a2698d 5 years ago
thiagocrepaldi
ashbhandare Initial running changes
d71857d8
ashbhandare Checkpointing aggregation changes
2cbe436c
ashbhandare compare with older version
903a44aa
ashbhandare initial cleanup
b31e833e
ashbhandare Add zero test, minor fix
340ef43e
ashbhandare Fix zero test, transform, formatting
3d776787
ashbhandare Review comments
1757e5cc
ashbhandare add more unit tests
bd402344
ashbhandare review comments
0146849e
ashbhandare Try fix CI
29510b78
ashbhandare Add additional check on just aggregation code
9055d292
ashbhandare Try fix ckpt gen
82fef12d
ashbhandare Add pregenerated ckpt for CI, enable zero test in e2e
30d2ce78
ashbhandare Moving test to nightly, removing ckpt files
a928d873
ashbhandare Add tests to dist GPU CI
6adec1b4
ashbhandare Fix dist test
99ca90dd
ashbhandare Review comments
5bc59119
ashbhandare ashbhandare force pushed from 59a2698d to 5bc59119 5 years ago
thiagocrepaldi
thiagocrepaldi dismissed these changes on 2020-12-03
ashbhandare ashbhandare dismissed their stale review 5 years ago
thiagocrepaldi
thiagocrepaldi commented on 2020-12-04
ashbhandare Fix test
eac63e6c
ashbhandare ashbhandare force pushed to eac63e6c 5 years ago
thiagocrepaldi
thiagocrepaldi approved these changes on 2020-12-04
ashbhandare ashbhandare merged 7cebf76a into master 5 years ago
ashbhandare ashbhandare deleted the aibhanda/zero_1_ckpt branch 5 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone