DeepSpeed
a02de228 - pipe engine _aggregate_total_loss: more efficient loss concatenation (#4327)

Commit
2 years ago
pipe engine _aggregate_total_loss: more efficient loss concatenation (#4327) * _aggregate_total_loss: more efficient loss concatenation optimize _aggregate_total_loss function in order to remove dependancy of copying from device to host and back to device. This reduce the runtime on the host. * Fixing the if/else block on which the optimization should take place --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading