Megatron-DeepSpeed
Support skip iteration flag
#177
Merged

Support skip iteration flag #177

stas00 merged 42 commits into main from skip-iterations
jaketae
feature: support skip iteration flag
cde7a559
fix: robust input check for skip ranges
5a8015c1
feature: fast forward megatron train loop
453b806a
jaketae test: add basic test for skip iteration
4a1ef27f
jaketae
jaketae jaketae marked this pull request as ready for review 4 years ago
stas00
stas00
stas00 commented on 2021-11-04
stas00
stas00 commented on 2021-11-04
stas00
stas00 commented on 2021-11-04
stas00
jaketae Update megatron/training.py
7ae82e2f
jaketae fix: merge overlapping intervals
e473f4d7
jaketae fix: flush irrelevant intervals, fix boundary condition
460e6eb0
jaketae
stas00
stas00 commented on 2021-11-05
jaketae Update megatron/training.py
0f01330e
jaketae feature: log on rank 0
0ff51ea3
jaketae fix: use f-string
356bb579
jaketae fix: iteration is incremented first, then logged
4ac84b4e
jaketae test: add checks using stdout
a92b7ab2
jaketae
stas00
stas00 commented on 2021-11-05
stas00
stas00 commented on 2021-11-06
jaketae
jaketae commented on 2021-11-06
jaketae
jaketae commented on 2021-11-06
jaketae Update tests/test_training.py
d29e6892
jaketae Update tests/test_training.py
c657084b
stas00
jaketae
stas00
jaketae refactor: use loop to simplify asserts
8947cc6c
jaketae fix: end will be larger than last end
27329847
jaketae test: add checks on consumed tokens
f589eafc
jaketae
jaketae test: use parametrized variations
a1164c3c
jaketae test: simplify skip iter test to base, cl
d9aaa0b1
jaketae
stas00 Merge remote-tracking branch 'origin/main' into skip-iterations
48dbe64d
stas00
stas00 Trigger CI
2eb2a66b
stas00
stas00 2x instances
87116b34
stas00 2x instances
15507835
stas00
jaketae test: hard code num_gpus to 2
0b56230f
stas00
stas00
jaketae
stas00
jaketae test: change test name to zskip
10251c38
jaketae test: revert back to `get_gpu_count()`
205d8685
jaketae test: run only skip test
7900da52
jaketae
jaketae test: remove skip iter test
0bbe4044
stas00
jaketae
stas00
jaketae
stas00
stas00
stas00
jaketae
stas00
jaketae
jaketae fix: account for other ranks
fc811084
stas00 rework the test to do the right thing for cl
4b7de290
stas00 Merge remote-tracking branch 'origin/main' into skip-iterations
ded71f4e
stas00
stas00
stas00 undo debug
4fee00d7
stas00
stas00 wip
e1c23d0c
stas00
stas00 success
9a649fa9
stas00
jaketae
jaketae commented on 2021-11-11
jaketae Update megatron/arguments.py
d1caab31
conglongli
conglongli commented on 2021-11-11
jaketae
stas00
jaketae
stas00
stas00
stas00 commented on 2021-11-14
jaketae fix: update flag name
4abcd8c4
jaketae
stas00
jaketae chore: backport commit 7a0158e
1a70624c
jaketae chore: simplify test
989e2c6c
jaketae
stas00 Trigger CI
f1e9283a
stas00
stas00 Trigger CI
bb29ae97
stas00
stas00 small tweaks
87ef7991
stas00 Trigger CI
4e0581a0
stas00 stas00 merged 106a9a6f into main 4 years ago
stas00 stas00 deleted the skip-iterations branch 4 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone