DeepSpeed
8b191d7c - Long sequence parallelism (Ulysses) integration with HuggingFace (#5774)

Commit
1 year ago
Long sequence parallelism (Ulysses) integration with HuggingFace (#5774) This PR enhances capabilities of [DeepSpeed long sequence (context) parallelism (aka DS Ulysses)](https://dl.acm.org/doi/10.1145/3662158.3662806) with support for HuggingFace (and by extension other frameworks) models. With HF integration, users can use sequence parallelism for model pre/mid/post-training, finetuning etc. Usage requires both _torch >=2.2.2 and flash-attention_. ZeRO-1 and 2 are supported, ZeRO-3 and SPDA support in progress. Corresponding PR in HF is [PR32305](https://github.com/huggingface/transformers/pull/32305). --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Author
Parents
Loading