Reweighting strat for prefix lm (#190)
* First test to un bias the loss for prefix lm
* Woops
* Add same code for not deepspeed mode
* Improve testing
* Woops
* Test moving it inside?
* This changes the normalization factor in loss computation
* Fix
* Woops
* Better refactoring of loss normalization