distributed submit script fixes for model_args (#1135)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1135
- always use parse_known_args (so that we can separate out the model_args)
- pass in model_args to TrainerWrapper.
- provide "device" argument (defaulting to CUDA)
tested with
```
python -m torchbenchmark.util.distributed.submit --model=torchbenchmark.models.resnet50.Model --partition=train --job_dir=/data/home/dberard/logs --nodes=2 --ngpus=8
```
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D39076188
Pulled By: davidberard98
fbshipit-source-id: 5a21b54a4a165ef2f0b72c9380757cd8d4cf557d