DeepSpeed
Transformer-kernel - supporting any arbitrary sequence-length
#587
Merged

Loading