DeepSpeed
Use cuda tensors for allgather
#1548
Merged

Loading