Possible fix to make AMP work with DDP in the trainer (#4728)
* manually set device in trainer args
* check if current device is cuda before set_device
* Explicitly set GPU ID when using single GPU
This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099