transformers
7e0ea699 - HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate (#41832)

Commit
29 days ago
HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate (#41832) * HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * make it work + tests Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * cleanup Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * undo Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * normalize Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * always return cp_size Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * cleanup Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * extract code into _deepspeed_cp_compute_loss Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * fix Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * ALST/Ulysses sequence parallelism docs * typo * add link to UlyssesSPDataLoaderAdapter * adapt to renaming to SP Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * improve Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * fix Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * Update docs/source/en/deepspeed.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * address comments Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * address comments Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * Update src/transformers/trainer.py Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * address comments Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * address comments Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * Update src/transformers/trainer.py Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * style Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> * Update docs/source/en/deepspeed.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update docs/source/en/deepspeed.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Account for Sequence Parallelism (SP) dataloader adapter effect * Update src/transformers/trainer.py Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update docs/source/en/deepspeed.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update docs/source/en/deepspeed.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * model_accepts_loss_kwargs to False * better comment * Apply suggestion from @kashif * Apply suggestion from @kashif * Apply suggestions from code review * Apply suggestion from @kashif * Apply suggestion from @kashif * Apply suggestion from @kashif * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Apply suggestion from @kashif * Apply suggestion from @kashif --------- Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Author
Parents
Loading