Fix moe cpu offload #5220
Skip gradient-norm averaging when cpu-device is selected
feb578af
fix formatting
9eb6129e
Merge branch 'master' into fix-moe-cpu-offload
9a59df40
average the grad-norms by sending the gradients to GPU when using off…
ae4bc936
Merge branch 'fix-moe-cpu-offload' of https://github.com/RezaYazdaniA…
d8255e07
Merge branch 'master' into fix-moe-cpu-offload
4fc1d8b9
tjruwase
approved these changes
on 2024-03-04
mrwyattii
merged
e6e8c137
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub