Squash 3 commits to 1
ebb79c86
Add --no-layer-norm-fusion argument
21c90de1
Add --no-optimizer-fusion argument
e0487132
Bugfix (thanks to Thomas Wang for catching this)
18e2c65b
Fix the bug of FusedLayerNorm on ROCm (#96)
9b7cd052
Revert cherry-picked changes to .py
277e1d38
Add LUMI eval compat
2963caea
Update tasks
32f039c2
Merge pull request #1 from bigscience-workshop/lumi_eval
fdd57c4b
add inverse_sqrt lr decay style
2ca2338c
fix no warmup case
ad60932f
use t5x formula
0823ad8c
avoid num_steps > decay_steps case
a093db6f
remove casting as math.sqrt does that
b4601b9e
add lr-warmup-style argument taking "constant" or "linear" values
4dae1399
refactor num_steps_
5fbb1dd5
docs
6299fb24
fix formulas
4e866509
fix formula
50c69359
correct comment
5c642dd3
note about replicating t5x
1b14a28c
Merge pull request #2 from NouamaneTazi/inverse-sqrt-lr
5e811b66
quick fix for upper triang masked softmax cuda kernel for seq_len < 8192
5365f41f
Merge pull request #3 from NouamaneTazi/large-seqlen-kernels
98749637
Use torch.multiprocessing.set_start_method('spawn')
c41cc5e0
skip_warmup on __setstate__
6732bc9b
Copy preliminary UL2
ab29faf5
DeepSpeed compat
9328ad2c
DS Group compat
351f4f24
Adapt eval for denoiser
abc19b83
Simpler padding
816c32d1
Fix sampling
bdbd54a0
Switch padding
cacf267c
Merge pull request #4 from TurkuNLP/ul2
47691320
Upate sampling
557b09ce
Update UL2
a6f69bf2
Add get_samples_mapping
d0d277fe
Import math
3f29df89
Fix prefixlm
52073863
tmp
9490e50e
Merge branch 'main' into tmp
9c8d02ca
Revert UL2 Tokenizer Changes
6936afba
Merge pull request #7 from TurkuNLP/tmp
a1088c1c
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub