HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate (#41832)
* HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* make it work + tests
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* cleanup
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* undo
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* normalize
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* always return cp_size
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* cleanup
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* extract code into _deepspeed_cp_compute_loss
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* fix
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* ALST/Ulysses sequence parallelism docs
* typo
* add link to UlyssesSPDataLoaderAdapter
* adapt to renaming to SP
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* improve
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* fix
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* address comments
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* address comments
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* Update src/transformers/trainer.py
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* address comments
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* address comments
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* Update src/transformers/trainer.py
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update src/transformers/trainer.py
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* style
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Account for Sequence Parallelism (SP) dataloader adapter effect
* Update src/transformers/trainer.py
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* model_accepts_loss_kwargs to False
* better comment
* Apply suggestion from @kashif
* Apply suggestion from @kashif
* Apply suggestions from code review
* Apply suggestion from @kashif
* Apply suggestion from @kashif
* Apply suggestion from @kashif
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Apply suggestion from @kashif
* Apply suggestion from @kashif
---------
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>