Megatron-DeepSpeed
da31db64 - Fix deepspeed prefix-lm (#107)

Commit
4 years ago
Fix deepspeed prefix-lm (#107) * Fix pretrain prefix lm using deepspeed * Fix: self._args to args * First set attn_mask in model and then build model * Fix: enforce that we pass down tuple instead of generator * Attention mask does not need to be transposed * BIGGEST HACK EVER * Remove BIGGEST HACK * Skip prefix test as PP>1 doesn't work yet on deepspeed * Unskip prefix test * Merge branch 'main' into thomas/fix_deepspeed_prefix
Author
Parents
Loading