Test different layer norm #270
WIP
8d7a6038
Wip
240f673e
Woops
1cdcd7de
WIP
29372806
Woops
7fcff06b
Woops
1f2f8007
Woops
f152e487
Test with alibi
ce02dd16
Still trying to reproduce
02365d14
Huh
42d6b4e3
Have high LR to see weights actually change
c20c8ba4
Launch bf16
7f2441ed
Woops
a4172bf9
Make test to work with both bf16 and fp16 to see who fails
5fbe1072
Woops
a0c09132
Remove assert
6b19339c
Try to figure out how the divergence happens
a5e32958
I think bias starts to diverge first
7145f6df
Woops
311e5317
Woops
39d4b8f9
Woops
8ffb278f
Add embed layer norm
2389bfdf
Woops
0cf35ee3
Backward compatibility on torch
f0d6d179
Better
07ccb3db
Merge remote-tracking branch 'origin/main' into thomas/test_different…
3c5e4914
fix
a5b5edc0
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub