DeepSpeed
support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix
#5259
Merged

support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix #5259

tjruwase merged 20 commits into deepspeedai:master from inkcherry:bf16_moe
inkcherry
inkcherry support bf16_optimizer moe training
8599e34b
inkcherry inkcherry requested a review from mrwyattii mrwyattii 2 years ago
inkcherry inkcherry requested a review from tjruwase tjruwase 2 years ago
tjruwase tjruwase removed review request from mrwyattii mrwyattii 2 years ago
tjruwase tjruwase requested a review from tohtana tohtana 2 years ago
tjruwase Merge branch 'master' into bf16_moe
259ddf0e
inkcherry fix real_dp world_size
8179f3c1
inkcherry Merge branch 'bf16_moe' of https://github.com/inkcherry/DeepSpeed int…
c01ead39
tohtana
inkcherry
inkcherry
mosheisland
tohtana
inkcherry
mosheisland
tohtana
inkcherry Make the gradient and gradient norm scale of MOE more reasonable.
b4221735
inkcherry clean up
8510d5d0
inkcherry moe grad scale fix
4a0efe66
inkcherry Merge remote-tracking branch 'master' into HEAD
92fbd5ec
inkcherry
mosheisland
mosheisland commented on 2024-03-19
mosheisland
mosheisland commented on 2024-03-19
mosheisland
inkcherry fix dp_world_size position
c2d50eae
inkcherry make grad_norm more precise for fp16
51c9136d
inkcherry refine code
ad8803bc
inkcherry inkcherry requested a review from awan-10 awan-10 2 years ago
inkcherry tp compatibility
1d51f69b
inkcherry clean up
7de966cf
inkcherry
inkcherry fix typo
443b8ec9
inkcherry inkcherry changed the title support bf16_optimizer moe expert parallel training support bf16_optimizer moe expert parallel training and moe grad_scale/grad_norm fix 2 years ago
inkcherry fix operator order
973f2717
inkcherry
inkcherry inkcherry changed the title support bf16_optimizer moe expert parallel training and moe grad_scale/grad_norm fix support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix 2 years ago
inkcherry Merge branch 'master' into bf16_moe
44505aa0
tohtana
mosheisland
mosheisland commented on 2024-03-25
inkcherry fix total_norm .item()
f22d141c
tjruwase Merge branch 'master' into bf16_moe
e4401ec0
tohtana
tohtana approved these changes on 2024-03-26
inkcherry fix ut
966a074a
inkcherry inkcherry requested a review from loadams loadams 2 years ago
inkcherry
inkcherry Merge branch 'bf16_moe' of https://github.com/inkcherry/DeepSpeed int…
6e214739
tjruwase tjruwase merged e5dd5501 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone