Megatron-DeepSpeed
b3cf1755 - Reweighting strat for prefix lm (#190)

Commit
4 years ago
Reweighting strat for prefix lm (#190) * First test to un bias the loss for prefix lm * Woops * Add same code for not deepspeed mode * Improve testing * Woops * Test moving it inside? * This changes the normalization factor in loss computation * Fix * Woops * Better refactoring of loss normalization
Author
Parents
Loading