support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix #5259
support bf16_optimizer moe training
8599e34b
Merge branch 'master' into bf16_moe
259ddf0e
fix real_dp world_size
8179f3c1
Merge branch 'bf16_moe' of https://github.com/inkcherry/DeepSpeed int…
c01ead39
Make the gradient and gradient norm scale of MOE more reasonable.
b4221735
clean up
8510d5d0
moe grad scale fix
4a0efe66
Merge remote-tracking branch 'master' into HEAD
92fbd5ec
fix dp_world_size position
c2d50eae
make grad_norm more precise for fp16
51c9136d
refine code
ad8803bc
tp compatibility
1d51f69b
clean up
7de966cf
fix typo
443b8ec9
inkcherry
changed the title support bf16_optimizer moe expert parallel training support bf16_optimizer moe expert parallel training and moe grad_scale/grad_norm fix 2 years ago
fix operator order
973f2717
inkcherry
changed the title support bf16_optimizer moe expert parallel training and moe grad_scale/grad_norm fix support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix 2 years ago
Merge branch 'master' into bf16_moe
44505aa0
fix total_norm .item()
f22d141c
Merge branch 'master' into bf16_moe
e4401ec0
tohtana
approved these changes
on 2024-03-26
fix ut
966a074a
Merge branch 'bf16_moe' of https://github.com/inkcherry/DeepSpeed int…
6e214739
tjruwase
merged
e5dd5501
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub