transformers
feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart
#22591
Merged

Loading