onnxruntime
Megatron checkpointing
#6293
Merged

Megatron checkpointing #6293

ashbhandare merged 11 commits into master from aibhanda/DH_ckpt
ashbhandare
ashbhandare ashbhandare marked this pull request as ready for review 5 years ago
ashbhandare ashbhandare requested a review from BowenBao BowenBao 5 years ago
ashbhandare ashbhandare requested a review from liqunfu liqunfu 5 years ago
ashbhandare ashbhandare requested a review from spandantiwari spandantiwari 5 years ago
ashbhandare ashbhandare requested a review from thiagocrepaldi thiagocrepaldi 5 years ago
ashbhandare ashbhandare requested a review 5 years ago
ashbhandare ashbhandare force pushed to 08f1412c 5 years ago
SherlockNoMad SherlockNoMad added training
ashbhandare ashbhandare force pushed from 5b96e5a4 to 6cbc7aea 5 years ago
pengwa
pengwa
pengwa commented on 2021-01-13
baijumeswani
baijumeswani commented on 2021-01-13
ashbhandare
ashbhandare ashbhandare force pushed from 948dab84 to 2d6e1b35 5 years ago
pengwa
pengwa commented on 2021-01-18
pengwa
pengwa commented on 2021-01-18
pengwa
pengwa commented on 2021-01-18
pengwa
pengwa commented on 2021-01-18
pengwa
baijumeswani
baijumeswani commented on 2021-01-19
ashbhandare ashbhandare force pushed from 2d6e1b35 to c59e0e7b 5 years ago
pengwa
pengwa dismissed these changes on 2021-01-20
baijumeswani
baijumeswani commented on 2021-01-20
baijumeswani
baijumeswani commented on 2021-01-20
thiagocrepaldi
thiagocrepaldi requested changes on 2021-01-20
baijumeswani
baijumeswani commented on 2021-01-21
ashbhandare Add bart fairseq run script
beb937b3
ashbhandare Add frontend change to enable megatron
94651e65
ashbhandare Initial changes for checkpointing
23009b94
ashbhandare Megatron optim state loading, checkpoint aggregation, frontend distri…
468de6c0
ashbhandare Add load_checkpoint changes
708f60e3
ashbhandare Fix CI
11808a40
ashbhandare Cleanup
2c4cdec0
ashbhandare Fix CI
2d813bcb
ashbhandare review comments
1215bd0d
ashbhandare review comments
e34761db
ashbhandare
ashbhandare review comments:
5b48e311
ashbhandare ashbhandare dismissed their stale review via 5b48e311 5 years ago
ashbhandare ashbhandare force pushed from c59e0e7b to 5b48e311 5 years ago
baijumeswani
baijumeswani commented on 2021-01-22
thiagocrepaldi
thiagocrepaldi approved these changes on 2021-01-22
ashbhandare ashbhandare merged 60c772e2 into master 5 years ago
ashbhandare ashbhandare deleted the aibhanda/DH_ckpt branch 5 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone