DeepSpeed
Fix a convergence issues in TP topology caused by incorrect grad_norm.
#5411
Merged

Fix a convergence issues in TP topology caused by incorrect grad_norm. #5411

inkcherry
inkcherry fix grad norm for tp
287fa5eb
inkcherry refine code
a7e8a7fe
inkcherry remove unnecessary clip_gradients fun
ea41928e
inkcherry improve perf by loop-free implementations
e74b7ca6
inkcherry Modify the comments.
79cc4cef
inkcherry update
3ebed5ea
inkcherry Merge remote-tracking branch 'master' into tp_grad_fix
fc537b8e
inkcherry inkcherry requested a review from mrwyattii mrwyattii 1 year ago
inkcherry inkcherry requested a review from tjruwase tjruwase 1 year ago
tjruwase
tjruwase tjruwase removed review request from mrwyattii mrwyattii 1 year ago
tjruwase tjruwase requested a review from tohtana tohtana 1 year ago
tjruwase tjruwase requested a review from conglongli conglongli 1 year ago
conglongli
conglongli requested changes on 2024-04-15
inkcherry refine comments
df976ca6
tohtana
tohtana approved these changes on 2024-04-16
conglongli Merge branch 'master' into tp_grad_fix
a40263f5
conglongli conglongli enabled auto-merge 1 year ago
conglongli
conglongli approved these changes on 2024-04-16
conglongli conglongli merged 0896503e into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone