Fix deepspeed prefix-lm (#107)

Commit

4 years ago

Fix deepspeed prefix-lm (#107) * Fix pretrain prefix lm using deepspeed * Fix: self._args to args * First set attn_mask in model and then build model * Fix: enforce that we pass down tuple instead of generator * Attention mask does not need to be transposed * BIGGEST HACK EVER * Remove BIGGEST HACK * Skip prefix test as PP>1 doesn't work yet on deepspeed * Unskip prefix test * Merge branch 'main' into thomas/fix_deepspeed_prefix

References

#107 - Fix deepspeed prefix-lm

Author

thomasw21

Parents

b5098e68

Megatron-DeepSpeed da31db64 - Fix deepspeed prefix-lm (#107)

Megatron-DeepSpeed
da31db64 - Fix deepspeed prefix-lm (#107)