Megatron-DeepSpeed
adding scalenorm, attention_init_method and relu^2
#139
Open

adding scalenorm, attention_init_method and relu^2 #139

huu4ontocord
huu4ontocord adding scalenorm, attention_init_method which uses the normal init wi…
ee876652
huu4ontocord huu4ontocord requested a review from stas00 stas00 4 years ago
huu4ontocord huu4ontocord requested a review from thomasw21 thomasw21 4 years ago
huu4ontocord huu4ontocord requested a review from jaketae jaketae 4 years ago
thomasw21
thomasw21 commented on 2021-10-18
jaketae
jaketae commented on 2021-10-19

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone