Support skip iteration flag (#177)
* feature: support skip iteration flag
* fix: robust input check for skip ranges
* feature: fast forward megatron train loop
* test: add basic test for skip iteration
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* fix: merge overlapping intervals
* fix: flush irrelevant intervals, fix boundary condition
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* feature: log on rank 0
* fix: use f-string
* fix: iteration is incremented first, then logged
* test: add checks using stdout
* Update tests/test_training.py
* Update tests/test_training.py
* refactor: use loop to simplify asserts
* fix: end will be larger than last end
* test: add checks on consumed tokens
* test: use parametrized variations
* test: simplify skip iter test to base, cl
* Trigger CI
* 2x instances
* 2x instances
* test: hard code num_gpus to 2
* test: change test name to zskip
* test: revert back to `get_gpu_count()`
* test: run only skip test
* test: remove skip iter test
* fix: account for other ranks
* rework the test to do the right thing for cl
* undo debug
* wip
* success
* Update megatron/arguments.py
* fix: update flag name
* chore: backport commit 7a0158e
* chore: simplify test
* Trigger CI
* Trigger CI
* small tweaks
* Trigger CI
Co-authored-by: Jake Tae <>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>