DeepSpeed
72608904 - reduce cpu host overhead when using moe (#5578)

Commit
1 year ago
reduce cpu host overhead when using moe (#5578) The operation `.to('cpu') `is not necessary for exp_counts, and it will cause device to host synchronization which damage performance. Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading