DeepSpeed
Use cuda events to improve timing for multi-stream execution
#1881
Merged

Loading