Sync layer norm #271

thomasw21
thomasw21 Better
07ccb3db
thomasw21 Force synchronize the layer norms parameters across all TP
391ed488
thomasw21 thomasw21 changed the base branch from main to thomas/test_different_layer_norm 3 years ago
stas00 import mpu
98d0e7cc
stas00 use the bf16 branch for testing
279a77eb
stas00 `torch.testing.assert_equal` didn't make it (#273)
87a9dba0
stas00 Merge remote-tracking branch 'origin/main' into thomas/fix_layer_norm
dbb59140
stas00 bf16 comms requite pt-1.11
70f91f82
stas00 already part of the function
835a3e5c
stas00 reproduce the crashing on resume
37795a92
stas00
stas00 commented on 2022-03-25
stas00
stas00 commented on 2022-03-25
stas00 run just the test we want for now
3ec65f7c
thomasw21 all_reduce is an in_place operation
8271d419
thomasw21
thomasw21 commented on 2022-03-25
thomasw21 Make a test that TP reshaping works
b418b47a
thomasw21 Woops
4b7207b5
thomasw21 Woops
3bc58243
thomasw21 Woops
05c99db6
thomasw21 Woops
55e10c63
thomasw21 Woops
2ab8a3ac
thomasw21 Woops
d357839d
thomasw21 Woops
5fb231c1
thomasw21 Woops
cc7ff45b
thomasw21 Woops
7cdb1be8
thomasw21 Fix load issue
4574ec97
thomasw21 Woops
04e89d14
thomasw21 Fix checkpoint path
e9431002
thomasw21 Test that force sync will allow TP changes
09cead38
thomasw21 Nit
77abee61
thomasw21 Now that we have a force sync mechanism, let's try to reproduce
64a62c80
thomasw21 Compare model_states_rank
0b7afcc9
thomasw21 test
ce017338
thomasw21 Row column bias should be synchronized as well
89ab0b72
thomasw21 New list of matching embeddings
42997b2a
thomasw21 Figure out why state differs
e0ef1683
thomasw21 Test for final weight
1fc4fe82
thomasw21 Test that torch_rng_state
7ebbed16
thomasw21 Fix non matching torch_rng_state for tp_rank=0
2c49216a
thomasw21 Update test
007ecb4b
thomasw21 I'm surprised one can apply inplace operation here
c3844b5c
thomasw21 Test out the loss from the fp32 weights and optimizer states
189f0547

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone