bigscience-workshop/Megatron-DeepSpeed

Pull Requests Commits

thomasw21 committed 3 years ago

07ccb3db

Backward compatibility on torch

thomasw21 committed 3 years ago

f0d6d179

thomasw21 committed 3 years ago

0cf35ee3

Add embed layer norm

thomasw21 committed 3 years ago

2389bfdf

thomasw21 committed 3 years ago

8ffb278f

thomasw21 committed 3 years ago

39d4b8f9

thomasw21 committed 3 years ago

311e5317

I think bias starts to diverge first

thomasw21 committed 3 years ago

7145f6df

Try to figure out how the divergence happens

thomasw21 committed 3 years ago

a5e32958

thomasw21 committed 3 years ago

6b19339c

thomasw21 committed 3 years ago

a0c09132

Make test to work with both bf16 and fp16 to see who fails

thomasw21 committed 3 years ago

5fbe1072

thomasw21 committed 3 years ago

a4172bf9

thomasw21 committed 3 years ago

7f2441ed

Have high LR to see weights actually change

thomasw21 committed 3 years ago

c20c8ba4

thomasw21 committed 3 years ago

42d6b4e3

Still trying to reproduce

thomasw21 committed 3 years ago

02365d14

Test with alibi

thomasw21 committed 3 years ago

ce02dd16

thomasw21 committed 3 years ago

f152e487

thomasw21 committed 3 years ago

1f2f8007

thomasw21 committed 3 years ago

7fcff06b

thomasw21 committed 3 years ago

29372806

thomasw21 committed 3 years ago

1cdcd7de

thomasw21 committed 3 years ago

240f673e

thomasw21 committed 3 years ago

8d7a6038

[tensorboard] add rename and remove event tools (#269)

stas00 committed 3 years ago

Verified affff3d2

[kill switch] fix test (#268)

stas00 committed 3 years ago

Verified 26feccc1

disable samples-per-dataset, steps-per-dataset, tokens-per-dataset (#267)

stas00 committed 3 years ago

Verified de3a0277

[kill switch] correct sys.exit (#266)

stas00 committed 3 years ago

Verified 1893811e

Sorry, last change was meant to a PR. This reverts commit d0fcf4170def7205426117016d4622c745f33883.

TevenLeScao committed 3 years ago

497aa1bf

Newer Older