DeepSpeed
Introduce all_reduce_hook to support gradient aggregation across replica groups.
#7764
Open

Loading