feat(granitemoe*): Remove logits upcast when computing loss (#42753)
* feat: Remove logits upcast when computing loss
When the CausalLM loss is used, the upcast is done in the loss function
utils, so this is redundant.
https://github.com/huggingface/transformers/issues/42709
Branch: GraniteOptionalUpcast-42709
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* chore: make fix-copies
https://github.com/huggingface/transformers/issues/42709
Branch: GraniteOptionalUpcast-42709
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>