DeepSpeed Sequence (#535)
* DS sequence impl
* add communication groups for sequence parallelism
* add all_to_all to torch comm backend
---------
Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com>
Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>