Megatron-DeepSpeed
Sync layer norm
#271
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
38
Changes
View On
GitHub
Commits
Better
thomasw21
committed
4 years ago
Force synchronize the layer norms parameters across all TP
thomasw21
committed
4 years ago
import mpu
stas00
committed
4 years ago
use the bf16 branch for testing
stas00
committed
4 years ago
`torch.testing.assert_equal` didn't make it (#273)
stas00
committed
4 years ago
Merge remote-tracking branch 'origin/main' into thomas/fix_layer_norm
stas00
committed
4 years ago
bf16 comms requite pt-1.11
stas00
committed
4 years ago
already part of the function
stas00
committed
4 years ago
reproduce the crashing on resume
stas00
committed
4 years ago
run just the test we want for now
stas00
committed
4 years ago
all_reduce is an in_place operation
thomasw21
committed
4 years ago
Make a test that TP reshaping works
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Fix load issue
thomasw21
committed
4 years ago
Woops
thomasw21
committed
4 years ago
Fix checkpoint path
thomasw21
committed
4 years ago
Test that force sync will allow TP changes
thomasw21
committed
4 years ago
Nit
thomasw21
committed
4 years ago
Now that we have a force sync mechanism, let's try to reproduce
thomasw21
committed
4 years ago
Compare model_states_rank
thomasw21
committed
4 years ago
test
thomasw21
committed
4 years ago
Row column bias should be synchronized as well
thomasw21
committed
4 years ago
New list of matching embeddings
thomasw21
committed
4 years ago
Figure out why state differs
thomasw21
committed
4 years ago
Test for final weight
thomasw21
committed
4 years ago
Test that torch_rng_state
thomasw21
committed
4 years ago
Fix non matching torch_rng_state for tp_rank=0
thomasw21
committed
4 years ago
Update test
thomasw21
committed
4 years ago
I'm surprised one can apply inplace operation here
thomasw21
committed
4 years ago
Test out the loss from the fp32 weights and optimizer states
thomasw21
committed
4 years ago
Loading